Color Space Conversion
-
Upload
shabarish-nama -
Category
Documents
-
view
241 -
download
0
Transcript of Color Space Conversion
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 1/11
Design and Implementation of Efficient Architecturesfor Color Space Conversion
F. Bensaali and A. AmiraSchool of Computer Science, Queen’s University of Belfast,
University Road, BT7 1NN, Belfast,UK[f.bensaali, a.amira]@qub.ac.uk
Abstract
Color space conversion is very important in manytypes of image processing applications includingvideo compression. This operation consumes upto 40% of the entire processing power of a highlyoptimised decoder. Therefore, techniques which effi-ciently implement this conversion are desired. Thispaper presents four different scalable architecturesfor efficient implementation of two such color spaceconverters using an FPGA based system. Distributedarithmetic technique and systolic design have beenexploited to implement the proposed structures
on the Celoxica RC1000-PP FPGA developmentboard. The implementation approaches exhibitsbetter performances when compared with existingimplementations.
Keywords: Color space Conversion, Systolic ar-chitecture, Distributed arithmetic, FPGA.
1 Introduction
Color is a visual sensation produced by the light inthe visible region of the spectrum incident on theretina. Since the human visual system has three typesof color photoreceptor cone cells, three componentsare necessary and sufficient to describe a color [1].
Color spaces (also called color models or colorsystems) is a method by which we can specify, createand visualise color. There are many existing colorspaces and most of them represent each color as apoint in a three-dimensional coordinate system. Eachcolor space is optimized for a well-defined applicationarea [2]. The three most popular color models areRGB (used in computer graphics); YIQ, YUV and
YCrCb (used in video systems); and CMYK (used incolor printing). All of the color spaces can be derivedfrom the RGB information supplied by devices such
as cameras and scanners.
Processing an image in the RGB color space, witha set of RGB values for each pixel is not the mostefficient method. To speed up some processing stepsmany broadcast, video and imaging standards useluminance and color difference video signals, such asYCrCb, making a mechanism for converting betweenformats necessary. Several cores for RGB to YCrCbconversion can be found in the market, which havebeen designed for FPGA implementation, such as thecores proposed by Amphion Ltd [3], CAST.Inc [4]and ALMA .Tech [5].
As part of an ongoing research project to developa hardware accelerator for image and signal pro-cessing algorithms based on matrix computations atQueen’s University of Belfast [6, 7, 8, 9], This paperproposes the use of FPGA as a low cost acceleratorfor RGB ↔ YCrCb Color Space Converters (CSCs)using Systolic Architecture (SA) and DistributedArithmetic (DA) approaches. For the second ap-proach, two architectures based on serial and parallelmanipulation of pixels have been proposed.
The target hardware for the implementation andverification of the proposed architectures is Celox-ica RC1000-PP PCI based FPGA development boardequipped with a Xilinx XCV2000E Virtex FPGA[10, 11]. The composition of the rest of the paper is asfollows. A review for the conversion from R’G’B’ toY’CrCb is given in section 2. Sections 3 and 4 are con-cerned with the mathematical backgrounds and thedescriptions of the proposed architectures based SAand DA techniques respectively. Then the hardwareimplementations with results and analysis are thenpresented in Section 5. Finally concluding remarksare given in section 6.
37
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 2/11
2 Color Space Conversion: A
Review
As mentioned in the introduction, many color modelshave been proposed, each oriented towards supportinga specific task or solving a particular problem. De-scribed below are the two color systems selected forour study which are used in many image processingapplications.
2.1 RGB Color Space
RGB color space is a simple and robust color defini-tion. RGB uses three numerical components to rep-resent a color. This color space can be thought of as a three-dimensional coordinate system whose axescorrespond to the three components, R or Red, G orGreen, and B or Blue. RGB is the color space that
computer displays use. It corresponds most closely tothe behavior of the human eye [1]. RGB is an addi-tive color system. The three primary colors red, green,and blue are added to form the desired color. For atrue color image, the red, green, and blue componentsof a pixel are each with eight bits width. In total, itmay have sixteen million (224) possible colors. Eachcomponent has a range of 0 to 255, with all three 0sproducing black and all three 255s producing white[1]. In the rest of this paper, the gamma-correctedRGB values are noted R’G’B’.
2.2 Y’CrCb Color SpaceY’CrCb is a scaled and offset version of the YUV colorspace where Y represents luminance (or brightness),U represents color, and V represents the saturationvalue. In this color space R’G’B’ is separated into aluminance part (Y’) and two chrominance parts (Cband Cr). Y’ is defined to have a range of 16 to 235,Cb and Cr have a range of 16 to 240 [1].
2.3 Converting From R’G’B’ to
Y’CrCb
Decomposing an R’G’B’ color image into one lu-minance image and two chrominance images is themethod that has been used in most commercial appli-cations such as face detection [12, 13] , as well as theJPEG and MPEG imaging standards [14, 15, 16].The suitability of the Y’CrCb color space for thesekind of applications is due to:
• The non correlation among the spaces of Y’CrCb, so each space can be analysed sepa-rately.
• Human eyes are more sensitive to the change of
brightness than of color, so Cr and Cb spacescan be compressed more heavily than Y’ spaceto get better compression ratio.
The calculation of Y’CrCb color components fromR’G’B’ components consumes up to 40% of the pro-cessing power in a highly optimised decoder [14]. Ac-celerating this operation would be useful for the ac-celeration of the whole process. A color in the R’G’B’color space is converted to the Y’CrCb color space
using the following equation:
Y =0.257R + 0.504G + 0.098B + 16Cr=0.439R +−0.368G +−0.071B + 128Cb=−0.148R +−0.291G + 0.439B + 128
(1)
While the inverse conversion can be carried outusing the following equation:
R
=1.164Y
+ 1.596Cr +−222.912G=1.164Y +−0.813Cr +−0.392Cb + 135.616B=1.164Y + 2.017Cb +−276.8
(2)
Figure 1 shows the direct mapping of the equa-tions 1 and 2 .
X
X
X
X
X
X
X
X
X
+
+
+
R’ / Y’ G’ / Cb B’ / Cr 16 / -222.912
128 / 135.616
128 / -276.8
0.257 / 1.164 0.504 / 0.0 0.098 / 1.596
-0.148 / 1.164 -0.29 1 / -0.392 0.439 / -0 .813
0.439 / 1.164 -0.368 / 2.017 -0.071 / 0.0
round
round
round
Y’ / R’
Cb / G’
Cr / B’
Figure 1: General Block Diagram for R’G’B’ ↔Y’CrCb CSC
3 Proposed CSC based SA
A SA represents a network of PEs that rhythmicallycompute and pass data through the system. Themain features of systolic systems are modularity andregularity, which are important in FPGA implemen-tations [7]. In this section two architectures based onbit parallel SA approach for CSC implementation aredescribed.
The CSC core implements the following mathe-matical formula to convert from one space to another:
38
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 3/11
C 0
C 1C 2
=
A00 A01 A02 A03
A10 A11 A12 A13
A20 A21 A22 A23
×
B0
B1
B2
1
(3)
Where C i (0 ≤ i ≤ 2) and Bi (0 ≤ i ≤ 3) representthe input and output color components respectively.
Equation 3 can be mapped into the two proposedarchitectures as shown in Figures 2 and 3.
C0
C1
C2
PE00
PE02
PE01
PE12PE11
PE20
PE22
PE21
b0 B
1B
2
A00
A01
A02
A10 A
11A
12
A20
A21
A22
PE10
PE03
PE13
PE23
B3
A03
A13
A23
Delay
SE: Storage Element
PE structure
Cout
>> f
Logical Shift Right
Signed Integer
Multiplier Signed Integer
Adder A
ij
Bi S E
S E
S E
Cin
Figure 2: Proposed systolic architecture (1)
PE0
PE1
PE2
B3
B2
B1
A23
A13
A03
A22
A12
A02
A21
A11
A01
A20
A10
A00
PE3
B0
C2
C1
C0
Figure 3: Proposed systolic architecture (2)
Since the matrix A coefficients are real numbers,floating-point or fixed-point representations can be
used to perform the multiplication. If the rangeof real numbers values that must be represented issmall or can be scaled in order to make it smaller,
fixed-point arithmetic is one way of providing cheapfast non-integer support. Fixed-point arithmetic isappropriate for our application because, as it can beseen from equations 1 and 2, the range of the valuesis small.
The first architecture consists of twelve identicalPEs (the number of PEs is equal to N ×M , where N and is M are the number of rows and columns of thematrix A respectively). Each PE comprises a parallelfixed-point Multiply ACcumulator (MAC), a set of Storage Elements (SEs) where the coefficients Aik
and Bk are stored and another storage element forpipelining the partial products. The MAC containsa parallel signed integer multiplier, a parallel signedinteger adder and a right shifter which has the roleof shifting the multiplier output by the number of bits used for the fractional part representation of the
color components. The inputs data elements Aik arefed in a parallel fashion while the vector elements Bk
are fed in a parallel fashion and remain fixed in theircorresponding PE cell during the entire computationof the operation. Because of the values range of theR’G’B’ and Y’CrCb components, the inputs elementsare presented with 13 bits (8 bits for integer part and5 bits for fractional part).
The second architecture consists of four identicalPEs; each PE has the same structure as the PEs usedin the first architecture. The two architectures differin the throughput and the area required for each one.It is worth noting that using the first architecture, theentire computation can be carried out after M clockcycles and requires N ×M PEs, while using the secondarchitecture the entire computation can be carriedout after 2×(M −1) clock cycles and requires M PEs.
Table 1 illustrates the performances obtained bythe two proposed architectures.
In our case the throughput rate has been definedas the reciprocal of the time between successiveoutputs vector. It can be seen from the table thatarchitecture (1) delivers data at a higher throughput
rate when compared with architecture (2).
The two proposed architectures (1) and (2) can beused for applications requiring matrix-vector product,such as in 3D affine transformations [8].
4 Proposed CSC Based DA
Since color space conversion can be expressed as aMatrix-Vector (MV) multiplication, two algorithmsbased DA are presented in this section.
DA distributes arithmetic operations rather thangrouping them as multipliers do. Conventional DA,called ROM-based DA, decomposes the variable input
39
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 4/11
Table 1: Architectures Performances
Architecture Computation time Area complexityThroughput rate
(vector/clock cycle)Proposed (1) (M )T O(N ×M ) 1Proposed (2) 2(M − 1)T O(M ) 1/N
of the inner product to bit level in order to generateprecomputed data. ROM-based DA uses a ROMtable to store the precomputed data, which makes itregular and efficient in the use of the silicon area, in aVLSI implementation. The advantage of a DA-basedROM approach is its efficiency of implementation.The basic operations required are a sequence of ROMs, addition, subtraction and shift operationsof the input data sequence [17]. Examples for theuse of DA can be found in these references [17, 18, 19].
4.1 Proposed Architecture Based Se-
rial Manipulation Approach
4.1.1 Mathematical Background
Consider the matrix-vector product given by the fol-lowing equation:
C i =
N −1k=0
Aik ×Bk (4)
Where {Aik}’s are L-bits constants and {Bk}’sare written in the unsigned binary representation asshown in equation 5:
Bk =
W −1m=0
bk,m × 2m (5)
Where bk,l is the mth bit of Bk , which is zero orone, W is the word-length used which represents theresolution for each color component of a pixel.
Substituting 5 in 4,
C i =N −1k=0
Aik × (W −1m=0
bk,m × 2m) (6)
=W −1m=0
(N −1k=0
Aik × (bk,m × 2m)
Define:
Z m =
N −1k=0
Aik × bk,m (7)
Therefore, C i can be computed as:
C i =
W −1m=0
Z m × 2m (8)
The idea is that since the term Z m depends onthe bk,m values and has only 2N possible values, it is
possible to precompute and store them in ROMs. Aninput set of N bits (b0,m, b1,m, . . . b(N −1),m) is used asan address to retrieve the corresponding Z m values.The ROM’s content is different and depends on theconstant matrix A coefficients. These intermediateresults are accumulated in W clock cycles to produceC i coefficients.
4.2 Case Study: Converting From
R’G’B’↔ Y’CrCb
Since all the components are in the range of 0 to 255, 8bits are enough to represent them. In our application(N = 4 and W = 8), C i can be computed as:
C i =7
m=0
Z m × 2m (9)
Where:
Z m =3
k=0
Aik × bk,m (10)
3 ROMs (one for each matrix A row) with the sizeof 2N = 24 = 16 are needed in order to store theprecompute 24 possible partial products values. Sincethe last element of the vector B is equal to 1:
b3,m =
1 for m = 00 for m = 0
(11)
Equation 10 can be rewritten as:
C i =7
m=0
Z ∗l × 2m + Ai3 (12)
40
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 5/11
Where:
Z ∗m =2
k=0
Aik × bk,m (13)
It is worth mentioning that the size of the ROMshas been reduced to 23. Table 2 gives the content of each ROM.
Table 2: Content of the ROM i (0 ≤ i ≤ 2)
b0,m b1,m b2,mThe Contentof the ROM i
0 0 0 00 0 1 Ai2
0 1 0 Ai1
0 1 1 Ai1 + Ai2
1 0 0 Ai0
1 0 1 Ai0 + Ai2
1 1 0 Ai0 + Ai1
1 1 1 Ai0 + Ai1 + Ai2
4.2.1 Proposed Architecture
Since our objective is to implement a core whichperforms two different color conversions (R’G’B’↔Y’CrCb), 6 ROMS are needed (3 for each conversion).Figures 4 and 5 show the proposed core pins and its
internal architecture respectively.
CSC
B1
B2
C0[0:7]
C1[0:7]
C2[0:7]
B0
S
Figure 4: Symbol of the CSC Core
The pins description is given in table 3.
Table 3: Pins Description
Name Dir DescriptionB0 I First input color space componentB1 I Second input color space componentB2 I Third input color space componentC 0 O First output color space component
C 1 O Second output color space componentC 2 O Third output color space componentS I Color space conversion type selection
<< mC
0
C1
<< m +
+C
2
3 ROMs
Block
(RGB
to
YCrCb)
3 ROMs
Block
(YCrCb
to
RGB)
b2,m
b1,m
b0,m
S
CE
CE
PE
+
+
<< m +
+
Figure 5: Serial CSC based DA Architecture
The proposed architecture consists of three iden-tical Processing Elements (P Es) and two memoryblocks. Each P E comprises a parallel ACCumulator(ACC) and a right shifter and each memory blockconsists of three ROMs with the size of 23 each(see Figure 6). The ROM’s content is different anddepends on the matrix A coefficients, which dependon the conversion type.
ROM1
ROM2
ROM3
b2,m
b1,m
b0,m
P0
P1
P2
Figure 6: Memory Block Structure
It is worth mentioning that our architecture is scal-able, however it can be used to perform n conversionsby adding every time 3 × n ROMs in order to storethe matrix conversion coefficients and keeping alwaysthe same P Es. An N × M image can be converted
using the proposed architecture by setting the inputsevery 8 clock cycles using the R’G’B’ components of a new pixel (Y’CrCb for the inverse conversion).
41
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 6/11
4.3 Proposed Architecture Based Par-
allel Manipulation Approach
4.3.1 Mathematical Background
Consider an N × M image (Figure 7)(N : image
height, M : image width).
Let represent each image pixel by bijk (0 ≤ i ≤N − 1, 0 ≤ j ≤ M − 1, 0 ≤ k ≤ 2), where:
bij0 = R
ij
the red component of thepixel in row i and column j
bij1 = G
ij
the green component of thepixel in row i and column j
bij2 = B
ij
the blue component of thepixel in row i and column j
(14)
The image can be converted using the followingmathematical formula:
c000
c001
c002
...
c0(M−1)0
c0(M−1)1
c0(M−1)2
c100
c101
c102
...
c1(M−1)0
c1(M−1)1
c1(M−1)2
... ... ... c(N−1)00
c(N−1)01
c(N−1)02
...
c(N−1)(M−1)0
c(N−1)(M−1)1
c(N−1)(M−1)2
=
A00 A01 A02 A03
A10 A11 A12 A13
A20 A21 A22 A23
⊗
b000
b001
b002
1
...
b0(M−1)0
b0(M−1)1
b0(M−1)2
1
b100
b101
b102
1
...
b1(M−1)0
b1(M−1)1
b1(M−1)2
1
... ...
...
b(N−1)00
b(N−1)01
b(N−1)02
1
...
b(N−1)(M−1)0
b(N−1)(M−1)1
b(N−1)(M−1)2
1
(15)
Where the operation ⊗ can be defined as follows:
Each vector
cij0
cij1cij2
is the result of the product
A00 A01 A02 A03
A10 A11 A12 A13
A20 A21 A22 A23
×
bij0bij1bij2
1
, where cijk
represent the output image color space components
and A = A00 A01 A02 A03
A10 A11 A12 A13
A20 A21 a22 A23
represents one
of the constant matrices in equations 1 and 2.
The cijk elements (the output image color spacecomponents) can be computed using the followingequation:
cijk =3
m=0
Akm × bijm (16)
Where {Akm}’s are L-bits constants and {bijm}’sare written in the unsigned binary representation asshown in equation 17:
bijm =W −1l=0
bijm,l × 2l (0 ≤ m ≤ 2) (17)
Using the same development in the previous sec-tion, equation 16 can be rewritten as:
cijk =7
l=0
Z ∗l × 2l + Ak3 (18)
Where:
Z ∗l =2
m=0
Akm × bijm,l (19)
Likewise the first proposed architecture, TheROM’s content is different and depends on the ma-trix A coefficients, which depend on the conversion
type.
4.3.2 Proposed Architecture
Equation 17 can be mapped into the proposedarchitecture as shown in Figure 8.
The architecture consists of 8 identical P E ns (0 ≤n ≤ 7). Each P E n comprises three parallel signedinteger adders, three n right shifters and one ROMsblock, which have the structure as shown in figure 6.It is worth noting that the architecture has a latencyof W and a throughput rate equal to 1. The entire
image conversion can be carried out in (Latency +(N × M )Throughput) = 8 + (N × M ) clock cycles,while using the standard algorithm (Figure 9), the
42
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 7/11
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 8/11
Handel-C code(FPGA Hardware)
FPGA bitstream
(full configuration)
FPGA bitstream
(partial configuration)
Celocixa DK2
IDE
Xilinx Layout
Tools
Xilinx JBits
External Cores
(Schematic, VHDL ,CoreGen ...)
EDIF
FPGA
place&route
System-level model
C code(host processor)
Host processor
program
C Compiler (MS Visual C++)
Simulation
HW/SW
partitioning
Host Processor platform
FPGA BoardReal-time
prototyping
Prototyping Platform
FPGAconfiguration
Figure 11: Handel-C design flow
banks. All are accessible by the FPGA and any deviceon the PCI bus in parallel [10]. A schematic blockdiagram of RC1000-PP board is shown in Figure 12.
Bank0
Bank1
Bank2
Bank3PCI
XCV2000E
DMA
Control
Status8 Bit
Figure 12: RC1000-PP block diagram
5.1 CSC Based SA
Since the vector last element B3 is equal to 1, thenumber of PEs in the two architectures shown infigures 2 and 3 can be reduced. Figures 13 and14 show the modified architectures. It is worthmentioning that using the first architecture, theentire computation can be carried out after (M − 1)clock cycles and requires N × (M − 1) PEs, whileusing the second architecture the entire computationcan be carried out after 2 × (M − 1) − 1 clock cyclesand requires (M − 1) PEs.
During the conversion between (R’G’B’ ↔Y’CrCb), the outputs are rounded. Rounding usu-ally looks at the decimal value and if it is greaterthan or equal to 0.5, then the result is increased byone. This implies a condition to verify and anotheraddition operation. A more efficient way to round a
number is to add 0.5 to the result and truncate thedecimal value. This technique has been applied inour implementation. The initial value for each parallel
PE00
PE02
PE01
PE12PE11
PE20
PE22
PE21
B0 B1 B2
A00
A01
A02
A10 A
11A
12
A03 + 0.5
A20
A21
A22
A23
+ 0.5
C0
C1
C2
PE10A
13+ 0.5
Figure 13: Modified systolic architecture (1)
PE0
PE1
PE2
B0
B1
B2
A20
A12
A02
A21
A11
A01
A20
A10
A00
A23
+ 0.5
A13
+ 0.5
A03 + 0.5
C2
C1
C0
Figure 14: Modified systolic architecture (2)
adder in the three first PEs is set to (Ai3 + 0.5), where(0 ≤ i ≤ 2). The parallel signed adders and multi-pliers have been implemented using Xilinx’s CoreGenutility, which contains many designs that can oftensave time for a programmer and it is possible to in-tegrate CoreGen blocks with a program in Handel-Cusing the interface declaration [22].
5.2 CSC Based DA
This section describes the hardware implementationof the CSCs based DA principles. The ROMs havebeen implemented using the FPGA configurable LogicBlocks (CLBs) LUTs, which have some interestingcapabilities that allow creating very fast and efficientdesigns such as the RAM and ROM capability [23].Tables 4 and 5 give the content of the ROMs used forR’G’B’ to Y’CrCb and Y’CrCb to R’G’B’conversionsfor both architectures, respectively.
The second proposed architecture can be used forthe inverse conversion (Y’CrCb to R’G’B’) by:
44
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 9/11
Table 4: The Content of the ROMs (R’G’B’ to Y’CrCb)
R
m/R
ij0,l G
m/G
ij1,l B
m/B
ij2,l ROM1 ROM2 ROM3
0 0 0 0 0 00 0 1 0.098 -0.071 0.439
0 1 0 0.504 -0.368 -0.2910 1 1 0.602 -0.439 0.1481 0 0 0.257 0.439 -0.1481 0 1 0.355 0.368 0.2911 1 0 0.761 0.071 -0.4391 1 1 0.859 0 0
Table 5: The Content of the ROMs (Y’CrCb to R’G’B’)
Y
m/Y
ij0,l Crm/Crij1,l Cbm/Cbij2,l ROM1 ROM2 ROM3
0 0 0 0 0 00 0 1 0 -0.392 00 1 0 1.596 -0.813 1.5960 1 1 1.596 -1.025 1.5961 0 0 1.164 1.164 1.1641 0 1 1.164 0.772 1.1641 1 0 2.76 0.351 2.761 1 1 2.76 -0.041 2.76
• Duplicating the ROMS using the same imple-mentation approach used for the first architec-ture(with a selector signal which allows the userto choose the appropriate converter); or
• Setting the contents of the ROMs in advance,depending on the desired conversion.
The precomputed partial products are stored inthe ROMs using 13 bits fixed point representation (8bits for integer part and 5 bits for fractional part).13-bit arithmetic is used inside the architecture.The inputs and outputs of the two architectures arepresented using 8 bits and the outputs are rounded.Likewise the CSCs based SA implementation, thesame rounding technique is applied here. The initialvalue for each accumulator ACC i is set in advance to(Ai3 + 0.5), where (0 ≤ i ≤ 2).
The MACs and parallel signed adders have beenimplemented using Xilinx’s CoreGen utility [22].The shifters and ROMs initialisation have beenimplemented using VHDL. All design componentshave been connected together using Handel-C.
In order to make a fair and consistent comparisonwith the existing FPGA based color space converters,the XCV50E-8 FPGA device has been targeted.Table 6 illustrates the performances obtained for theproposed architecture in terms of area consumed and
speed which can be achieved.
The proposed DA architectures based serial and
parallel manipulation approaches show significantimprovements in comparison with the existing im-plementations [3, 4, 5], which perform the R’G’B’ toY’CrCb conversion, in terms of the area consumed
and the maximum running clock frequency. Theadvantage of the two other proposed architectures isthat they can be used for any color space conversionbased on the equation 3.
Table 7 illustrates the hardware/software imple-mentations comparison in terms of the RMS error-due to the use of difference data representation inthe two implementations- (RM S Error =
1/(N ×M )N −1
i=0
M −1j=0 (I soft(i, j)− I hard(i, j))2)
and the computation time, when using the secondproposed DA architecture.
Table 7 shows the test results for two differentimages (Baboon image (512× 512) and Pepper image(256× 256) ). It can be seen that the same convertedimage can be obtained fastly when using the FPGAimplementation, with a minimum error (due to theuse of difference data representation in the two imple-mentations).
6 Conclusion
Processing an image in the RGB color space, with a
set of RGB values for each pixel is not the most ef-ficient method. To speed up some processing stepsmany broadcast, video and imaging standards use
45
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 10/11
Table 6: Performance comparison with existing CSC cores
Design Parameters Slices Speed (MHz)Proposed SA architecture (1) 305 68Proposed SA architecture (2) 1022 72
Proposed DA architecture (1) 70 128Proposed DA architecture (2) 193 234CAST.Inc [4] 222 112
ALMA. Tech [5] 222 105Amphion Ltd [3] 204 90
Table 7: Software/ hardware implementations for RGB to YCrCb CSC comparisons
OriginalImage
Softwareimplemen-
tation
Hardwareimplemen-
tation
RMS Error Computationtime (ms)
Software Hardware
Y 0.487Cr 0.630Cb 0.461
126 1.2
Y 0.684Cr 0.830Cb 0.396
43 0.28
luminance and color difference video signals, suchas YCrCb, making a mechanism for converting be-tween formats necessary. In this paper novel scal-able architectures based on DA and SA approaches forRGB ↔ Y CrCb conversions, which require enor-mous computing power, have been reported. The im-plementation result shows the effectiveness of the DAapproach. The performance in terms of the area usedand the maximum running frequency of the proposedarchitecture has been assessed and has shown thatthe proposed system requires less area and can be runwith a higher frequency when compared with existingsystems. The proposed systolic structures can per-form other conversions based on matrix-vector multi-plication, while the DA structure can be used for otherconversions by modifying the content of the ROMs.
References
[1] B. Payette, “Color Space Converter: R’G’B’to Y’CrCb,” Xilinx Aplication Note, XAPP637,V1.0, September 2002.
[2] R.C. Gonzalez and R.E. Woods, “Digital ImageProcessing,” Second Edition, Printice Hall Inc,2002.
[3] Datasheet (www.amphion.com), “Color SpaceConverters,” Amphion semiconductor Ltd,DS6400 V1.1, April 2002.
[4] Application Note (www.cast-inc.com), “CSCColor Space Converter,” CAST Inc, April 2002.
[5] Datasheet (www.alma-tech.com), “High Perfor-mance Color Space Converter,” ALMA Technolo-gies, May 2002.
[6] F. Bensaali and A. Amira, “Design and Efficient
FPGA Implementation of an RGB to YCrCbColor Space Converter Using Distributed Arith-metic,” Proceedings of the International Confer-ence on Field Programmable Logic (FPL), Lec-ture Notes in Computer Science, to be published by Springer Verlag, August, 2004.
[7] A. Amira, “A custom Coprocessor for MatrixAlgorithm,” PhD thesis, Queen’s University of Belfast, 2001.
[8] F. Bensaali, A. Amira, I.S. Uzun and A. Ahmed-said, “An FPGA Implementation of 3D Affine
Transformations,” The 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS’03), Sharjah, UAE, December, 2003.
46
ICGST-GVIP Journal, Volume 5, Issue1, December 2004
8/6/2019 Color Space Conversion
http://slidepdf.com/reader/full/color-space-conversion 11/11
[9] F. Bensaali, A. Amira, I.S. Uzun and A. Ahmed-said, “Efficient Implementation of Large Paral-lel Matrix Product for DOTs,” The International Conference on Computer, Communication and Control Technologies (CCCT’03), Florida, USA,July, 2003.
[10] Datasheet, (www.celoxica.com)“RC1000 Recon-figurable hardware development platform,” Ce-locixa Ltd.,2001.
[11] URL: www.xilinx.com
[12] A. Albiol, L. Torres and E.J. Delp, “An unsuper-vised color image segmentation algorithm for facedetection applications,” In Proceedings of the In-ternational Conference on Image Processing, pp681-684, Vol. 2, October 2001.
[13] P. Kuchi, P. Gabbur, P.S. Bhat and S. David,
“Human Face Detection and Tracking using SkinColor Modelling and Connected Component Op-erators,” The IETE Journal of Research, Special issue on Visual Media Processing, May 2002.
[14] M. Bartkowiak, “Optimisations of Color Trans-formation for Real Time Video Decoding,” Dig-ital Signal Processing for Multimedia Communi-cations and Services, EURASIP ECMCS 2001,Budapest, September 2001.
[15] J.L. Mitchell and W.B. Pennebaker, “MPEGVideo Compression Standard,” Chapman & Hall,
1996.
[16] J. Bracamonte, P. Standelmann, M. Ansorge andF. Pellandini, “A Multiplierless ImplementationScheme for the JPEG Image Coding Algorithm,” IEEE Nordic Signal Processing Symposium, Kol-marden, Sweden, June 13 - 15, 2000.
[17] A. Amira, “An FPGA Based Parameteris-able System For Discrete Hartley TransformsImplementation,” Proceedings of The Interna-tional Conference on Image Processing (ICIP),Barcelona, Spain, September 2003.
[18] H. Ohlsson and L. Wanhammer, “Maximally fastnumerically equivalent state-space recursive digi-tal filters using distributed arithmetic,” Proceed-ings of the IEEE Symposium in Nordic Signal Processing (NORSIG2000), Kolmarden, Sweden,pp 295-298, June 2000.
[19] O. Gustafsson and L. Wanhammar, “Implemen-tation of a Digital Beamformer in an FPGA us-ing Distributed Arrithmetic,” Proceedings of the IEEE Symposium in Nordic Signal Processing (NORSIG2000), Kolmarden, Sweden, pp 295-298, June 2000.
[20] Manual, (www.celoxica.com)“Handel-C Lan-guage Reference Manual,” Celocixa Ltd.,2003.
[21] URL: www.celoxica.com
[22] Application Note, “Xilinx CoreGen and Handel-C,” AN 58 v1.0, 2001.
[23] M. Defossez, “Using the Virtex Look-Up Tables,” Xilinx Application Note (www.xilinx.com).
47
ICGST-GVIP Journal, Volume 5, Issue1, December 2004