High Speed Lossless Data Compression

HIGH SPEED LOSSLESS DATA HIGH SPEED LOSSLESS DATA COMPRESSIONCOMPRESSION

PRESENTED BY:PRESENTED BY:

www.final-yearprojects.co.ccwww.final-yearprojects.co.cc

INTRODUCTIONINTRODUCTION Data compression is a technique for reducing data redundancy to Data compression is a technique for reducing data redundancy to

preserve the band width of a communication channel and to preserve the band width of a communication channel and to increase the capacity of data storage and in turn improves the increase the capacity of data storage and in turn improves the overall network performance.overall network performance.

Lossless data compression is a necessary to handle the enormous Lossless data compression is a necessary to handle the enormous

amount of digital data storage and retrievalamount of digital data storage and retrieval.. It also finds applications in high-speed communication networks It also finds applications in high-speed communication networks

for preserving the bandwidth of both the wired as well as wireless for preserving the bandwidth of both the wired as well as wireless channels.channels.

Lossless data compression system assures that the data at the Lossless data compression system assures that the data at the decoder output will be exactly identical to the data at the encoder decoder output will be exactly identical to the data at the encoder input.input.

LOSSLESS DATA LOSSLESS DATA COMPRESSION METHODSCOMPRESSION METHODS

1.1. Huffman Coding.Huffman Coding.2.2. Run Length Encoding.Run Length Encoding.3.3. Arithmetic Coding.Arithmetic Coding.4.4. Lempel-Ziv algorithms.Lempel-Ziv algorithms. like-LZ77like-LZ77 LZW.LZW.5. X-Match Pro Algorithms.5. X-Match Pro Algorithms.

DISADVANTAGE OF DISADVANTAGE OF STATISTICAL APPROACHESSTATISTICAL APPROACHES

1.1. Huffman coding and other statistical Huffman coding and other statistical approaches are not good options for modern approaches are not good options for modern day communications because the network day communications because the network traffic is neither predictable nor consistent.traffic is neither predictable nor consistent.

1.1. Software implementations of these algorithms is Software implementations of these algorithms is cost effective for low to moderately high link cost effective for low to moderately high link speed connections.speed connections.

1.1. Software solutions are not well suited for real Software solutions are not well suited for real time applications like network routers when time applications like network routers when compression and decompression at high compression and decompression at high throughputs is required on the fly.throughputs is required on the fly.

LZ77 ENCODING TECHNIQUELZ77 ENCODING TECHNIQUE

The LZ77 algorithm also called as LZ1 was The LZ77 algorithm also called as LZ1 was proposed by Ziv and Lempel in 1977.proposed by Ziv and Lempel in 1977.

It is a sequential algorithm that compresses It is a sequential algorithm that compresses strings of binary bytes of variable length into a strings of binary bytes of variable length into a fixed length compressed format.fixed length compressed format.

The two important steps in the algorithm are The two important steps in the algorithm are stringstring parsingparsing and and codingcoding..

The characters or symbols are elements of the The characters or symbols are elements of the alphabet alphabet AA and is a set of extended ASCII and is a set of extended ASCII characters i.e.256 distinct symbols in our case characters i.e.256 distinct symbols in our case and each symbol is and each symbol is one byte longone byte long..

The repeating phrases of incoming data are The repeating phrases of incoming data are replaced with fixed length code words.replaced with fixed length code words.

CR gives the compression ratio=(Length of CR gives the compression ratio=(Length of original data-length of code)/(length of original original data-length of code)/(length of original data)data)

Compression methodCompression method The algorithm employs a principle called sliding-window The algorithm employs a principle called sliding-window

and data to be compressed is pumped in the buffer of and data to be compressed is pumped in the buffer of length 2n symbols.length 2n symbols.

Initially, the first or the left half of the buffer referred as Initially, the first or the left half of the buffer referred as search buffer is filled with zeros.search buffer is filled with zeros.

The second half of the buffer of length L, holds the data The second half of the buffer of length L, holds the data that needs to be encoded and is referred as the new or look that needs to be encoded and is referred as the new or look ahead buffer.ahead buffer.

FOR EXAMPLE:FOR EXAMPLE:Consider a string of alphabet set of 3 symbols (0,1,2).Consider a string of alphabet set of 3 symbols (0,1,2).The LZ77 or LZ1 compression is applied on the data. The LZ77 or LZ1 compression is applied on the data.

S=0001010210210212021021200...(input string).S=0001010210210212021021200...(input string). Ls=9 (length of new buffer)Ls=9 (length of new buffer) 2n=18 (window size)2n=18 (window size)

Fig 1:LZ1 compression Fig 1:LZ1 compression

000000000000000000 000101021000101021 C1=22101C1=22101

C2=21102C2=21102

C3=20212C3=20212

C4=02220C4=02220

000000001000000001 010210210010210210

000010102000010102 102102120102102120

210210212210210212 021021200021021200

For applying LZ1 compression, longest match of the string For applying LZ1 compression, longest match of the string starting with the first position of the look ahead buffer is starting with the first position of the look ahead buffer is found in the search buffer.found in the search buffer.

As shown in fig.1,the longest match is a string consisting of As shown in fig.1,the longest match is a string consisting of 3 symbols "000“ of length 3.3 symbols "000“ of length 3.

The match can start from any position (0-8) of the search The match can start from any position (0-8) of the search buffer but it can extend to look ahead buffer.buffer but it can extend to look ahead buffer.

For convenience position 8 ( it is 22 in base 3 For convenience position 8 ( it is 22 in base 3 representation i.e. 8 base1o= 22base3) is chosen as the representation i.e. 8 base1o= 22base3) is chosen as the pointer where the match started and length of match is 3 pointer where the match started and length of match is 3 (310=103) and last symbol after the match is 1.(310=103) and last symbol after the match is 1.

Codeword is formed by concatenation of "pointer length lastsymbol".

So the first code word is 22 10 1 The code word is usually represented in the

same alphabet as the source data.

After forming the code, the buffer is shifted by length+1 positions on the left and look ahead buffer is filled with same number of symbols from right with incoming data.

The algorithm looks at the data through a window of fixed size Anything outside this window can neither be referenced nor encoded.

As more data is being encoded, the window slides along, removing the oldest encoded data from the window and adding new encoded data to it.

Algorithmwhile (look Ahead Buffer not empty){get a reference (position and length) to longest match;if (length > 0){output (position, length, next symbol);

shift the window length+1 positions along;}Else{output (0, 0, first symbol in the look ahead buffer);

shift the window 1 character along;}}}}

LZ1 DESIGN METHODOLOGY

LZ1 data compression is a sequential algorithm that compresses strings of binary bytes of variable length into a fixed length compressed format.

The major step in data compression is the reduction of repeated strings of incoming data into compact code words.

This involves comparisons of the symbols with each other to find similar symbols and of course finding the longest match.

Obviously the software solutions are limited by the processor speed in achieving high throughputs, therefore hardware solutions become inevitable.

In order to achieve high speeds we propose that the serial comparisons can be translated to a parallel architecture thus achieving higher throughputs.

The required degree of parallelism can be achieved by providing hardware for future comparisons also which implies looking ahead or unfolding of the hardware to speed up the comparison operation.

The look ahead buffer holds the data to be compressed while the already encoded data is present in the search buffer.

The sizes of the two buffers are critical to obtain better compression ratio.

By looking at the compression ratio with different buffer sizes, it is evident that greater the sizes of search and look ahead buffers, better is the compression performance.

On the other hand, increasing the size of these buffers not only increases the area requirements of the hardware design but also increases the critical path delay of the hardware, resulting in the decrease of throughput.

Thus, it is imperative that our design is easily scalable

Hardware Implementation

For hardware implementation, a total buffer of size 2n is considered where first half contains n symbols (xo, xl....... Xn-1) that have recently been coded.

The second half of buffer holds next n symbols (yO, yr,... Yn-1) that are yet to be coded.

For understanding, consider n=4. The symbols belong to the set of 256 extended ASCII characters and each symbol is a byte long (8-bits).

Parallel comparisons of source symbols are shown in Table 1.

Scalable Architecture for LZ1 Algorithm

WorkingWorking

The BMC select logic is the most critical part.

The column select logic starts from the left most BMC and based on its L jumps to the L th column to the right and from this column's L selects the next column while leaving the rest of the columns not required by the algorithm.

The minimum throughput achieved by the architecture is equal to the amount of unfolding or the number of BMCs bytes per cycles whereas the best case depends on the length L of the last BMC.

Based on the last BMC's L, the barrel shifter at the top shifts the y buffer to x buffer while the second barrel shifter brings the same amount of data from the FIFO.

Though the logic works in a single cycle but any amount of pipelining permissible by the user's constraints can be intelligently added to increase the speed of execution.

CONCLUSIONS

The paper presented a data compression architecture that provided a throughput of more than 1 Gbits/sec.

Applying the pipelining technique for reducing the critical path can further optimize the architecture.

The future work would also include the parameterization of different synthesizable blocks of the architecture so that an Integrated Design Environment (IDE) can be designed and developed.

The IDE serving as a tool would provide the flexibility of design space exploration to generate several variants of high throughput architecture while optimizing a set of design parameters subject to a set of design constraints.

REFERENCES

[1] Shih-Arn Hwang and Cheng-Wen Wu, "Unified VLSI Systolic Array for LZ Data Compression", IEEE Transactions on Ver Large Scale Integration, Vol. 9, No. 4, August 2001.

[2] D.Huffman,"A method for the construction of minimum redundancy codes," Proc. IRE, 1958, Vol 40,pp. 1098-1101, Sep 1952.

[3] S. Colomb., "Run Length Encoding" , IEEE Trans. Inform. Theory, Vol. IT-12, pp 399-401,July 1966.

[4] G.G. Langdon Jr., "An Introduction to Arithmetic Coding", IBM J.Res. Development, pp. 135-149, Mar 1984.

[5] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", IEEE trans. On Information Theory, vol. IT-23 No. 2,May 1977.

[6] T.Welsh, "A Technique for high-performance data compression",IEEE Computer, vol. 17, pp 8-10, 1984.

[7] J. luis Nunez and Simon Jones, "Gbits/s Lossless Data Compression Hardware", IEEE Transactions on VLSI Systems, vol. 11, No 3, June 2003.

[8] Chun-Te Chen, Liang Gi Cheni , "High Speed VLSI Design for LZ Based Data Compression", IEEE international symposium on Circuits and Systems, June 9-12 1997, Hongkong.

[9] S. Jones, "100 Mbit/s Adaptive Data Compressor Design using selectively Shiftable Content addressable Memory" in Proc. Pt. G,vol.1 39, no. 8, August 1992.

[10] C.Y. Lee and R.Y. Yang, "High throughput Data Compressor Design using Content Addressable Memory", Proc. Pt. G., vol.142, Feb 1995.

[11] D. Mark Royals, Tasso Markas, Nick Kanopoulos, John H. Reif, and James A. Storerer, "On the design and Implementation of Lossless Data Compression Chip", IEEE Journal of Solid State Circuits, vol.28, No. 9, 1993

[12] N. Ranganathan and S. Henriques, " High Speed VLSI designs for Lempel-Ziv -Based Data Compression", IEEE Transactions on Circuits and Systems-Il: Analog and Digital Signal Processing, vol.40, February 1993.

THANK YOUTHANK YOU

High Speed Lossless Data Compression

Documents

Transcript of High Speed Lossless Data Compression