IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH...

5
IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp. Science & Info. Technology, University of Malaya, Kuala Lumpur, Malaysia. { mustafa.alwahaib@siswa., koksheik@}um.edu.my ABSTRACT Although duplication free run-length coding (DF-RLC) in [1] encodes an image without increasing its file-size, it has in- consistent performance and low compression ratio. In this paper, we improve the performance of DF-RLC so that it is applicable to high entropy textured images using a global cod- ing mode. Such mode is based on generating an improved rule-based coding method based on SAFA entropy coding using variable length codewords. The generated codewords are assigned to intensity levels based on their probability of occurrences. These codewords can be generated in a way so that pixel intensity level could be distinguished from run in which case this differentiability is required in the tradi- tional RLC. Experiments are carried out using standard tex- tured and aerial images to verify the basic performances of the proposed method. The results suggest that the proposed method achieves better compression ratio when applied to textured images in comparison to DF-RLC [1], traditional RLC, Golomb-Rice and Huffman entropy codes. Moreover, the proposed method does not require the segmentation mode introduced in [1] and hence a unified global coding mode can be adopted. Index TermsDuplication Free Run-length Coding, SAFA code, lossless compression, textured images 1. INTRODUCTION The importance of image compression is widely reported and emphasized in existing literatures. The goal is to store an image at the highest possible quality with the smallest possi- ble number of bits. The main concept of image compression is based on removing redundant data to store the image in a compact form. In medical and military images, removing such redundant data is restricted by the ability of restoring them perfectly at the decompression stage. This restriction is determined by the applications of such images, which toler- ate no data loss. Therefore, these images are compressed by lossless compression methods. Lossless compression is based on modeling the image to search for data redundancy. Then, such redundancy is removed by entropy coding the modeled image. The entropy coding reduces the size of the modeled image by replacing its data by codewords of shorter length on average [2]. However, without any modeling, most of the entropy coding methods encode the image into a file of size greater than its original counterpart (raw), i.e., no com- pression is gained. For example, Huffman coding in JPEG [3] and Golomb-Rice coding in JPEG-LS [4] cause such file- size increment as observed in our experimental study (Section 4). Run-Length Coding (RLC) or Run-Length Limiting [5], which is also part of JPEG [6] and International Telecommu- nication Union (ITU) standard for facsimile communication [2], also causes such file-size increase. The RLC encodes a sequence of pixels with the same intensity level by first recording the actual intensity level, followed by their num- ber of occurrence which is referred to as run. For example, the sequence of pixels 9999995555555 is encoded by RLC as (9 6 5 7), where 6 and 7 are the runs for the intensity levels 9 and 5, respectively. The RLC is highly efficient in encod- ing image with long runs of pixels with the same intensity level (such as black and white documents) but inefficient in encoding gray-scale image of high spatial activity [6]. Such inefficiency is due to the high variation in pixel intensity lev- els in which case there are more intensity-run pairs to encode and hence increasing the file size. For example, a sequence of dissimilar pixels 1 2 3 4 5 will be encoded as (1 1 2 1 3 1 4 1 5 1) by RLC. Obviously, adding the run part increases the cardi- nality (size) of the set (double in this case) and we name such negative effect as duplication problem. To avoid the duplica- tion problem, RLC is applied only to repeated pixels by intro- ducing a special codeword to identify run in [5, 2, 7]. How- ever, these approaches lead to file-size increment as reported in [1]. This problem is obvious in textured and aerial images encoded by the traditional RLC due to the high spatial activi- ties and short runs (of pixels with the same intensity level) in these images. In [1], we proposed the duplication free Run- length Coding (DF-RLC), which overcomes such duplication problem, However, our method has inconsistent performance as it has no gain (i.e., input size is the same as output size) for some images due to the utilization of lengthy SAFA code- words. Hence, such image are segmented into blocks and DF- Proceedings of the IIEEJ Image Electronics and Visual Computing Workshop 2012 Kuching, Malaysia, November 21-24, 2012

Transcript of IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH...

Page 1: IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp.

IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODINGFOR TEXTURED IMAGES

Mustafa S. Abdul Karim and KokSheik Wong

Faculty of Comp. Science & Info. Technology,University of Malaya, Kuala Lumpur, Malaysia.{ mustafa.alwahaib@siswa., koksheik@}um.edu.my

ABSTRACT

Although duplication free run-length coding (DF-RLC) in [1]encodes an image without increasing its file-size, it has in-consistent performance and low compression ratio. In thispaper, we improve the performance of DF-RLC so that it isapplicable to high entropy textured images using a global cod-ing mode. Such mode is based on generating an improvedrule-based coding method based on SAFA entropy codingusing variable length codewords. The generated codewordsare assigned to intensity levels based on their probability ofoccurrences. These codewords can be generated in a wayso that pixel intensity level could be distinguished from runin which case this differentiability is required in the tradi-tional RLC. Experiments are carried out using standard tex-tured and aerial images to verify the basic performances ofthe proposed method. The results suggest that the proposedmethod achieves better compression ratio when applied totextured images in comparison to DF-RLC [1], traditionalRLC, Golomb-Rice and Huffman entropy codes. Moreover,the proposed method does not require the segmentation modeintroduced in [1] and hence a unified global coding mode canbe adopted.

Index Terms— Duplication Free Run-length Coding,SAFA code, lossless compression, textured images

1. INTRODUCTION

The importance of image compression is widely reported andemphasized in existing literatures. The goal is to store animage at the highest possible quality with the smallest possi-ble number of bits. The main concept of image compressionis based on removing redundant data to store the image ina compact form. In medical and military images, removingsuch redundant data is restricted by the ability of restoringthem perfectly at the decompression stage. This restriction isdetermined by the applications of such images, which toler-ate no data loss. Therefore, these images are compressed bylossless compression methods. Lossless compression is basedon modeling the image to search for data redundancy. Then,such redundancy is removed by entropy coding the modeled

image. The entropy coding reduces the size of the modeledimage by replacing its data by codewords of shorter lengthon average [2]. However, without any modeling, most ofthe entropy coding methods encode the image into a file ofsize greater than its original counterpart (raw), i.e., no com-pression is gained. For example, Huffman coding in JPEG[3] and Golomb-Rice coding in JPEG-LS [4] cause such file-size increment as observed in our experimental study (Section4). Run-Length Coding (RLC) or Run-Length Limiting [5],which is also part of JPEG [6] and International Telecommu-nication Union (ITU) standard for facsimile communication[2], also causes such file-size increase. The RLC encodesa sequence of pixels with the same intensity level by firstrecording the actual intensity level, followed by their num-ber of occurrence which is referred to as run. For example,the sequence of pixels 9999995555555 is encoded by RLC as(9 6 5 7), where 6 and 7 are the runs for the intensity levels9 and 5, respectively. The RLC is highly efficient in encod-ing image with long runs of pixels with the same intensitylevel (such as black and white documents) but inefficient inencoding gray-scale image of high spatial activity [6]. Suchinefficiency is due to the high variation in pixel intensity lev-els in which case there are more intensity-run pairs to encodeand hence increasing the file size. For example, a sequence ofdissimilar pixels 1 2 3 4 5 will be encoded as (1 1 2 1 3 1 4 1 51) by RLC. Obviously, adding the run part increases the cardi-nality (size) of the set (double in this case) and we name suchnegative effect as duplication problem. To avoid the duplica-tion problem, RLC is applied only to repeated pixels by intro-ducing a special codeword to identify run in [5, 2, 7]. How-ever, these approaches lead to file-size increment as reportedin [1]. This problem is obvious in textured and aerial imagesencoded by the traditional RLC due to the high spatial activi-ties and short runs (of pixels with the same intensity level) inthese images. In [1], we proposed the duplication free Run-length Coding (DF-RLC), which overcomes such duplicationproblem, However, our method has inconsistent performanceas it has no gain (i.e., input size is the same as output size)for some images due to the utilization of lengthy SAFA code-words. Hence, such image are segmented into blocks and DF-

Proceedings of the IIEEJ Image Electronics     and Visual Computing Workshop 2012

Kuching, Malaysia, November 21-24, 2012

Page 2: IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp.

RLC is applied individually to each block, which adds extraoverhead and complexity.

In this paper, we overcome this problem by improvingDF-RLC. The improvements include applying different setof SAFA codewords which are ending bits dependent (EBD)[1]. Such codewords are utilized to encode pixels so thata non-repeated pixel (i.e., its intensity level differs fromits neighboring pixels) is encoded by only one codeword,whereas a pixel repeated twice or more is encoded by twocodewords: (1) an EBD codeword indicating the intensitylevel of the pixel; (2) a codeword that records the run-lengthof the repeated pixels. The improved DF-RLC is applied toun-modeled textured and aerial images. It eliminates the du-plication problem and gains image compression for most ofthe test images considered without modeling them. Such per-formance is consistently attained using global coding mode.These improvements are verified empirically.

2. OVERVIEW OF DF-RLC [1]

DF-RLC is based on coding the pixels by applying a rule-based generative codewords referred to as SAFA codewords.Each SAFA codeword is derived from a general expressionthat satisfies the rules stipulated in Fig. 1. For example, thefirst three terms of the general expression are dxy, ddxy,and dxdy. A SAFA codeword is generated by assigninga combination of n bits (from a set P ) to each term in ageneral expression. Here, P is the set of all possible bi-nary sequence of length n, and the cardinality |P | = 2n.If P = {00, 01, 10, 11}, i.e., n=2, then d can assumeany of these combinations. For example, if d = 00, thenx,y ∈ {01, 10, 11} according to Rules R4 and R5 in Fig. 1.Hence, SAFA codewords that can be generated by assigningcombinations from P to each term of the general expressiondxy are 00 01 01, 00 01 10, 00 01 11, 00 10 01,. . . , etc.These codewords are referred to as ending bits dependent(EDB) codewords. The other type is referred to Ending bitsIndependent (EBI), which is generated in the same fashionbut y is kept undefined. Hence, we have the codewords 0001 y, 00 10 y, 00 11 y, · · · , and so on. Here, y is defined atthe encoding stage. We exploit y in EBI to achieve DF-RLCby mapping a pixel to an EBI codeword according to theprobability of occurrence of the intensity level of such pixelin the histogram of the image. Here, pixels of intensity levelsof high occurrences are assigned to short EBI codewords,and vice versa. After that, if such pixel is not repeated orrepeated only once (i.e., length = 2), then y in the codewordis set to state 1 or 2, respectively. If the pixel is repeated 3or more times, then y is assigned to state 3 and the followingcodeword is an EBD that records the actual run-length ofsuch pixel.

Fig. 1. Set of Rules in SAFA Code

3. IMPROVED DF-RLC

In this paper we improve DF-RLC in [1] from two aspects: (1)the method for generating EBD codewords; (2) the method forapplying DF-RLC to pixels. These improvements are detailedin the next sub-sections.

3.1. Set of EBD Codewords

In this section, we present the improved method to generateEBD codewords. Such set is generated by assigning combina-tions of bits to each term of the general expression as shownin Section 2. However, d can take only 2n − 1 combinationsfrom the set P , and one combination is reserved as the run flag(RF). Hence, d does not start with RF. For example, if 11 isreserved as RF from the set P , then d ∈ {00, 01, 10}. Hence,a codeword will never start with RF. However, x and y canassume the value of RF because RF causes no conflict withRules R4 and R5 in Fig. 1. Hence, using the general expres-sion dxy, if RF=11, then we have the codewords 00 01 01, 0001 10, 00 01 11, 01 00 00, 01 00 10, · · · , 10 11 11, where 1011 11 presents the last codeword that can be generated usingthe general expression dxy. Hence, the maximum numberof codewords that can be derived from a particular generalexpression is (2n − 1)3. To generate more codewords, thenext general expressions, namely, ddxy, dxdy, dddxy, etc.,are utilized by applying similar assignment process.

3.2. Improved Encoding Method by DF-RLC

The set of modified EBD codewords are utilized to encodepixels, which are scanned in raster order for two rounds (i.e.,two passes). In the first round, the histogram of the image isfound, followed by generating a lookup table and computingthe maximum run-length (MRL). The lookup table consistsof the list of EBD codewords, each assigned to an intensitylevel. The assignment is achieved according to the probabil-ity of occurrence of the intensity level such that the intensitylevels of higher occurrences (in the histogram) are assigned

Page 3: IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp.

to shorter EBD codewords, and vice versa.MRL is the maximum length of a sequence of pixels all

having the same intensity level. Hence, the run-lengths in theimage are encoded using a fixed length natural binary codes(NBC’s). The length of an NBC is denoted by m and com-puted as follows:

m = dlog2(MRL)e bits (1)

where dSe rounds S to the nearest integer value towards topositive infinity direction.

In the second round, the pixels are re-scanned in rasterorder, and two encoding modes are adaptively adopted. Ifthe encoder encounters a non-repeating pixel, then mode I isapplied. In this mode, the pixel is encoded by a single EBD,i.e., a codeword that is not followed by the reserved RF bits.Here, encoding is achieved by replacing the intensity level ofsuch pixel by the EBD codeword (according to the lookuptable), and proceeds to the next pixel.

If the encoder encounters a run of pixels (i.e., two or moreadjacent pixels of the same intensity level), then mode II isinvoked. In this mode, the intensity level of such run is firstencoded by a single EBD codeword (according to the lookuptable). Then, this EBD codeword is followed by the reservedRF, which is of constant length of two bits. This RF is fol-lowed by a fixed-length NBC of m bits, which records theactual run-length of these pixels. Table. 1 summarizes thetwo encoding modes in the improved DF-RLC.

3.3. Example of Encoding/Decoding

In this section, we present an example on encoding/decodingpixels by the proposed method. Assume a sequence of pix-els of intensity levels (5,9,9,9,6) are to be encoded. First,the lookup table is generated, and the first three generativeEBD codewords are assigned to these intensity levels. Ta-ble 2 presents the generated lookup table, where the first EBDcodeword is assigned to 9 because it has the maximum occur-rence among other intensity levels, whereas the third EBDcodeword is assigned to 6 because it occurs only once in thisexample. After that, the intensity level of each pixel is re-placed by its corresponding EBD codeword in Table 2. Inthis example, we reserved the combination 11 as RF. SinceMRL =3, m=2 bits as computed in Eq. (1).

After that, pixels are re-scanned and the encoding mode isadaptively chosen. Since 5 is not repeated, it is replaced withsingle codeword (00 01 10). However, 9 is repeated for threetimes. Hence, such run is encoded as follows: the intensitylevel of the run is replaced with (00 01 01) as shown in Table2, followed by RF=11, which is followed by the actual run-length count (01). Here, 01 is considered for the run of 3pixels because 00 is reserved to record a run of two pixels,10 to record a run of 4 pixels and so on, whereas the caseof having a single pixel (i.e., not repeated) is the completeoutput codeword in mode I (Table. 1). The complete output

Table 1. Summary of encoding modesMode SyntaxI EBDII EBD, RF, “actual run-length”

Table 2. Example on a lookup table, which consists to theassignment of EBD codewords to intensity levels accordingto their occurrences in the histogram

Intensity Level EBD9 00 01 015 00 01 106 00 01 11

codeword is (00 01 01) (11 01). To encode 6, the codeword(00 01 11) is used because 6 is not repeated in this example.Hence, the sequence of pixels (5,9,9,9,6) is encoded as (00 0110, 00 01 01 11 01, 00 01 11) by the improved DF-RLC.

If the intensity level of the pixels (in this example) is as-sumed to be of gray-scale, i.e., 8 bits are required to encodeeach original pixel, then 40 bits in total are required to en-code these 5 pixels. After applying the proposed method, thesame sequence of pixels is encoded by 22 bits only. Hence,the compression ratio in this example is 1.82, excluding theoverhead for the header information. For lossless decompres-sion, the sorted intensity levels in Table 2 are recorded as aheader of the image along with RF and m.

At the decoding stage, Table 2 is re-generated by readingthe header of the image. Then, each EBD codeword is re-mapped to its corresponding intensity level in Table 2. Here,the decoder examines the next two bits that follow each EBDcodeword. If these two bits are the RF, this means that thecurrent encoded intensity level is repeated, and the actual run-length is found by reading the next m bits after the RF. Hence,two or more pixels (depending on the length of the run) arerestored. If the next two bits (that follow the EBD codeword)are not RF, then such intensity level is not repeated, and henceonly one pixel is restored. This process is repeated until theentire image is reconstructed.

4. EXPERIMENTAL RESULTS

The proposed improved DF-RLC is implemented usingC/C++ c© programming language and Matlab c© and testedon a set of gray-scale images of the size of 512×512 pix-els. These images are textured and aerial images selectedrandomly from the standard images database in [8]. Fig. 2shows the set of test images for reference purposes.

Note that this empirical study is limited by: (a) applyingthe proposed method to textured images only; (b) compar-ing the performance of the proposed method with the originalDF-RLC [1], traditional RLC, binary Huffman coding andGolomb-Rice coding, where the length of remainder part in

Page 4: IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp.

(a) Image 1 (b) Image 2 (c) Image 3 (d) Image 4

(e) Image 5 (f) Image 6 (g) Image 7 (h) Image 8

(i) Image 9 (j) Image 10 (k) Image 11 (l) Image 12

(m) Image 13 (n) Image 14 (o) Image 15 (p) Image 16

(q) Image 17

Fig. 2. The set of test images [8]

Golomb-Rice codewords is chosen to be the one that achievesbest compression ratio. For fair comparison purpose, all thesemethods (except RLC) are used to encode the intensity levelsof the test images according to their PoO’s as detailed in Sec-tion 2.

The performance of the proposed method is verified fromtwo aspects: (a) the performance of the proposed method inencoding the test images without increasing their file-sizes(i.e., the size of the encoded image is same or less than itssize before the encoding); (b) the compression gained by ap-plying the proposed method, which is the ratio of the size ofun-encoded original image (i.e., raw) to its size after encod-ing.

The first aspect of the performance is verified by lookingat the compression ratio (CR). An encoding method does notcause duplication-problem if it encodes an image at CR ≥ 1.Table 3 presents the CR’s for test images encoded by ourproposed method and other existing coding methods. It is no-

Table 3. Compression ratios obtained by applying the pro-posed method to test images, compared with other entropycoding methods

Image Proposed DF-RLC [1] RLC Huffman GRC1 1.102 1.028 0.985 0.711 0.1102 1.052 0.899 0.985 0.841 0.3163 1.069 0.931 0.984 0.805 0.4124 1.109 0.966 0.984 0.776 0.3565 1.046 0.892 0.984 0.833 0.3786 1.075 0.915 1 0.810 0.2887 0.974 0.801 0.985 0.999 0.5438 0.994 0.818 0.988 0.899 0.5019 1.048 0.883 0.985 0.847 0.36910 1.059 0.892 0.985 0.842 0.32411 0.974 0.808 0.985 0.994 0.50312 1.066 0.908 0.988 0.835 0.32113 1.113 1.023 0.984 0.715 0.15314 1.032 0.973 0.984 0.798 0.68515 1.016 0.851 0.985 0.859 0.37616 1.053 0.895 0.985 0.832 0.35617 1.074 0.924 0.985 0.816 0.314

ticed that the proposed method attains CR > 1 for all imagesexcept images 7, 8 and 11 due to their high spatial activitieswhich requires encoding them by lengthy EBD codewords.On the other hand, only 2 images (i.e., 1 and 13) achievedCR > 1 in DF-RLC [1], whereas the rest of images are en-coded at CR < 0, which indicates that their sizes are in-creased due to encoding. Such increase is due to the utiliza-tion of lengthy EBD and EBI codewords in [1]. Similar per-formance is observed for images encoded by traditional RLC,which attains CR < 1 for all images accept image 6. The CRof this image is 1, which indicates no compression is gainedbut no duplication-problem occurred either. Both of Huffmanand Golomb-Rice codes failed to overcome the duplicationproblem due to the lengthy codewords of these two entropycoding methods. For that, these two entropy coding methodsshould be preceded by some image modeling methods to gaincompression. These results suggest that the proposed methodoutperforms other entropy coding methods in encoding tex-tured images when modeling is not involved.

For the second aspect, only our proposed method and [1]achieve CR > 1. However, all other entropy coding meth-ods fail to gain a compression, i.e., their CR values are eitherless than 1 (file-size increase) or equal to 1 (no compression isgained). This suggests that the proposed improved DF-RLCoutperforms the aforementioned entropy coding methods (in-cluding [1]) in term of image compression without any mod-eling.

Page 5: IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING · IMPROVEMENTS OF DUPLICATION FREE RUN-LENGTH CODING FOR TEXTURED IMAGES Mustafa S. Abdul Karim and KokSheik Wong Faculty of Comp.

5. CONCLUSIONS

The performance of DF-RLC [1] was improved by modify-ing the applied SAFA codewords and encoding method ofDF-RLC. The improved method was applied to textured im-ages which are of high entropy and high spatial activities. Al-though no modeling is applied to locate redundancy in the im-ages, the improved method showed better performance than[1] and the existing entropy coding methods considered interm of compression ratio. Moreover, the improved methodresulted in the lowest duplication rate among the methodsconsidered. Also, the proposed improvement operates in asingle (global) mode, unlike the original method [1] that re-lies on the statistics of smaller block (blocking mode). Asa future work, we would like to include image modeling tofurther enhance the compression ratio.

6. REFERENCES[1] Mustafa Safa Al-Wahaib and KokSheik Wong, “A loss-

less image compression algorithm using duplication freerun-length coding,” in Proceedings of the Second Inter-national Conference on Network Applications, Protocolsand Services, 2010, NETAPPS ’10, pp. 245–250.

[2] Roberto Togneri and Christopher J. S. DeSilva, Funda-

mentals of Information Theory and Coding Design, CRCPress, Inc., Boca Raton, FL, USA, 2003.

[3] D.A. Huffman, “A method for the construction ofminimum-redundancy codes,” Proceedings of the IRE,vol. 40, no. 9, pp. 1098 –1101, sept. 1952.

[4] S. Golomb, “Run-length encodings (corresp.),” Informa-tion Theory, IEEE Transactions on, vol. 12, no. 3, pp. 399– 401, jul 1966.

[5] Jon Louis Bentley, Daniel D. Sleator, Robert E. Tarjan,and Victor K. Wei, “A locally adaptive data compressionscheme,” Commun. ACM, vol. 29, no. 4, pp. 320–330,Apr. 1986.

[6] Athanassios Skodras, Charilaos Christopoulos, andTouradj Ebrahimi, “The JPEG 2000 still image compres-sion standard,” IEEE Signal processing Magazine, vol.18, pp. 36–58, 2001.

[7] David Salomon, Data Compression: The Complete Ref-erence, Springer-Verlag New York, Inc., Secaucus, NJ,USA, 2006.

[8] The USC-SIPI Image Database. [On-Line]:http://sipi.usc.edu/database/.