HDRchitecture: Real-Time Quantigraphic High …...Chapter 1 Introduction This thesis investigates...

HDRchitecture: Real-TimeQuantigraphic High Dynamic Range

(HDR) Imaging on Field-ProgrammableGate Array (FPGA) for WearCam

(Wearable Camera)

The Edward S. Rogers Sr. Departmentof Electrical & Computer Engineering

Tao Ai

Department of Electrical & Computer Engineering

University of Toronto

A thesis submitted in conformity with the requirements for the degree of

Master of Applied Science

c©Copyright by Tao Ai 2014

mailto:[email protected]

http://www.ece.utoronto.ca

http://www.utoronto.ca

Abstract

HDRchitecture: Real-Time Quantigraphic High Dynamic Range Imaging onField-Programmable Gate Array for Wearable Camera

Master of Applied Science, 2014

Tao Ai

Department of Electrical and Computer Engineering

University of Toronto

An important feature of wearable camera systems is that they must work in a

wide variety of situations under a wide range of lighting conditions. This is true

whether they are used to take pictures, or record video, or even to perform object

recognition. For these cameras to be used as a real-time seeing aid, we need to

produce High Dynamic Range (HDR) videos at a high frame rate.

With the advance of technology in the Field-Programmable Gate Array (FPGA)

industry, the increasing number of successful implementations of real-time imag-

ing systems makes FPGA an attractive platform for prototyping HDR video sys-

tem for wearable computers. In this thesis, an scalable and adaptable framework

is implemented on FPGA for real-time quantigraphic HDR imaging, for solving

the camera dynamic range limitation. Most of the implementing issues found in

the essential stages of a typical HDR imaging flow are addressed in detail.

ii

To Mom, Dad and Emma...

iii

Acknowledgements

The project of HDRchitecture started in my fourth year of undergraduate andevolved into a project on which I wanted to spend another two years. During theimplementation of this work, my thesis advisor, Prof. Steve Mann was willingto allow me to continue to explore based on my own interest. The amount offreedom he had extended to me was truly appreciated. Beside all the suggestionshe gave to my work , which helped my thinking and improved my work, he alsodemonstrated to me a synergy of art, science, and technology as a polymath.

I would also like to thank my colleagues and mentors: to Raymond Lo, for theamount of interaction, advices, and encouragement during the early prototypingof the system; and to Kalin Ovtcharov, who taught me the most on the FPGA im-plementation of a video-based project and led the team to build the backbone ofthe first prototype; and to Mir Adnan Ali, for the tremendous amount of supportand input for laying out the mathematical frameworks in the later compositingmethods. I also received a lot of suggestions from my colleagues Valmiki Ram-persad, and Jason Huang, who contributed to the predecessor of HDRchitecture(i.e. a real-time implementation of HDR video using GPU).

HDRchitecture is a project that combines the works of many great individuals.Thanks to all these who contributed to the project: to David Dai for the convolu-tion circuit, to Calvin Ngan for the window provisioning buffer, to Shane Gu forthe 4-up display, and to Sarmad Zulfiqar on the addressing circuit. Thanks alsoto Akshay Gill, Jose Emilio and Ling Zhong for their efforts in implementingthe domain-transform algorithm on FPGA.

Many great comments have come from Prof. Paul Chow, Prof. Jason Anderson,and Prof. Vaughn Betz, when I was taking their courses and used course-relatedprojects to investigate issues arose from the implementation of HDRchitecture. Iam very grateful for the supports provided by Prof. Greg Staffan, who had beenintensively helping me on improving the addressing circuit architecture. Hisadvices were invaluable during the development of the later compositing circuit.

Finally, I want to thank my parents and Emma, for the support and encourage-ment during the time I worked for this thesis. As an international student whospent seven years aboard, what I own to my parents was not only financial debt,but also the family time that I missed during the pursuit of my acedemic degrees.

iv

Contents

List of Figures viii

List of Tables x

Chapter 1 Introduction 11.1 Goal and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Quantigraphic HDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 FPGA as an Implementation Platform . . . . . . . . . . . . . . . . . . . . 3

1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2 Background and Previous Work 82.1 Quantigraphic Image Processing . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Video Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Camera Response Function Recovery . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Mann and Picard, 1993 . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Debevec and Malik, 1997 . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Mitsunaga and Nayar, 1999 . . . . . . . . . . . . . . . . . . . . . . 13

2.3.4 Robertson, Borman, and Stevenson, 1999 . . . . . . . . . . . . . . 13

2.3.5 Grossberg and Nayar, 2003 . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Comparametric Camera Response Function . . . . . . . . . . . . . . . . . 14

2.4.1 CCRF Composition . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Spatial Tonal Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 3 HDRchitecture 183.1 Capturing Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Frame Exposure Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

v

CONTENTS

3.3 HDR Frame Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 HDR Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Spatial Tonal Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 4 HDR Frame Buffer 244.1 Video Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.1 Address Generator . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.2 Double Line Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.3 Memory Burst Controller . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 AXI4 Read/Write Controller . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 5 Composition with Weighted-Sum 375.1 Straight-Forward Implementation . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Adjusting Equation for Improvement . . . . . . . . . . . . . . . . . . . . . 39

5.3 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4 Determine the Word-Lengths . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.5 Result of Weighted-Sum Implementation . . . . . . . . . . . . . . . . . . . 42

5.5.1 Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Chapter 6 Composition with Quadtree CCRF 446.1 Quadtree CCRF Composition . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 Resource Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.3 Quadtree Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.4 Circuit Architecture for 2D Function Evaluation . . . . . . . . . . . . . . . 47

6.5 Multiplexer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.6 Interpolation Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.7 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.8 Result of Quadtree CCRF Implementation . . . . . . . . . . . . . . . . . . 54

6.8.1 Resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.8.2 Error and Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Chapter 7 Spatial Tonal Mapping 58

vi

CONTENTS

7.1 Color Space Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.2 Window Provisioning Buffer . . . . . . . . . . . . . . . . . . . . . . . . . 597.3 Convolution and Edge Enhancement . . . . . . . . . . . . . . . . . . . . . 607.4 Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Chapter 8 Conclusion and Future Work 628.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Appendix A CCRF MUX Network with Nested IF-ELSE Statements 64

Appendix B CCRF MUX Network with separated Instantiations 67

References 70

vii

List of Figures

1.1 Overview of HDRchitecture Flow . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Application Example of HDR Used on Welding . . . . . . . . . . . . . . . 4

1.3 Application Example of HDR Used on Driver Assistant . . . . . . . . . . . 5

1.4 Application Example of HDR Used on Abakography . . . . . . . . . . . . 6

2.1 Mann’s 1993 Patent on Quantigraphic HDR Processing . . . . . . . . . . . 17

3.1 Top-Level Block Diagram of HDRchitecture . . . . . . . . . . . . . . . . . 18

3.2 Raw Video Containing Temporally Varying Exposures . . . . . . . . . . . 19

3.3 Top-Level Block Diagram of HDR Frame Buffer . . . . . . . . . . . . . . 20

3.4 The General Composition Architectures . . . . . . . . . . . . . . . . . . . 21

3.5 Example of the Four-Up Display . . . . . . . . . . . . . . . . . . . . . . . 22

4.1 A Common HDR Video Buffering Mistake . . . . . . . . . . . . . . . . . . 24

4.2 Example - Bracket Sliding Operation . . . . . . . . . . . . . . . . . . . . . 25

4.3 Clock Domains of HDR Frame Buffer . . . . . . . . . . . . . . . . . . . . 26

4.4 Frame Buffer Read/Write Paths . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Top-Level Block Diagram of the Video Interface . . . . . . . . . . . . . . . 29

4.6 Double-Line Buffer Operation . . . . . . . . . . . . . . . . . . . . . . . . 30

4.7 State Diagram of the Memory Burst FSM . . . . . . . . . . . . . . . . . . 31

5.1 Weighted-Sum Composition Circuit Before Improvement . . . . . . . . . . 38

5.2 Improved Weighted-Sum Compositing Circuit . . . . . . . . . . . . . . . . 40

5.3 Weighting Scheme Example . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4 Weighted-Sum Composition Used on EyeTap . . . . . . . . . . . . . . . . 43

6.1 The Pairwise Topology of CCRF-Based Composition . . . . . . . . . . . . 45

6.2 Quadtree Representation of CCRF . . . . . . . . . . . . . . . . . . . . . . 47

6.3 Indirect Table Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

viii

LIST OF FIGURES

6.4 Quadtree CCRF Compositing Circuit . . . . . . . . . . . . . . . . . . . . . 496.5 A Simple CCRF with Quadtree Compression and Multiplexers . . . . . . . 516.6 Interpolation Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.7 Area and Error vs Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.8 High-Definition HDR Video based on Quadtree Composition . . . . . . . . 56

7.1 Spatial Tonal Mapping Circuit . . . . . . . . . . . . . . . . . . . . . . . . 587.2 Convolution with Binary Kernel . . . . . . . . . . . . . . . . . . . . . . . 60

8.1 Real-Time HDR on FPGA Demonstration . . . . . . . . . . . . . . . . . . 62

ix

List of Tables

4.1 Video Interface Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2 Write Channel Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Resource Usage of the HDR Frame Buffer. . . . . . . . . . . . . . . . . . 36

5.1 Simulation Results for Determining the Word-Lengths. . . . . . . . . . . . 425.2 Resource Usage of the Weighted-Sum Compositing Circuits. . . . . . . . . 43

6.1 Resource Usage of the CCRF Compositing Circuit. . . . . . . . . . . . . . 536.2 CCRF Composition Error vs. Number of Leaves. . . . . . . . . . . . . . . 54

7.1 Resource Usage of the Tonal Mapping Circuit. . . . . . . . . . . . . . . . 61

x

Chapter 1

Introduction

This thesis investigates and discusses the design and implementation of a real-time highdynamic range imaging architecture on a field-programmable gate array using quantigraphicimaging methods.

1.1 Goal and Motivation

Quantigraphic high dynamic range imaging, as a technology, aims to solve the most commondifficulty of limited dynamic range in cameras, by computationally capturing, processing,and manipulating digital images with recovered camera response function. Essentially, itcreates device-independent scene-referred tonal values by means of oversampling [3, 4, 5].

A digital image is a discretized result from both domain and range of a continuous math-ematical color image model. It associates the tonal values in its range to in its domain. Thus,a discretized binary representation of the image will have two characteristics, i.e., spatialresolution and tonal resolution, for the image plane and the chosen color space, respectively.The tonal resolution, which eventually determines how many different colors can be repre-sented, is measured using “Dynamic Range” as a ratio between the largest and smallestnon-negative quantity for which a small incremental difference in the quantity can stillbe sensed by the sensor [30].

The emphasis of the quality of digital cameras needs to be given to both resolutions.Until now, camera manufacturers have been featuring sensors with only increasing spatialresolution, usually measured in megapixels. Most digital cameras can take images of higherresolution than most displays can offer. However, because of its important role in imageprocessing applications, dynamic range (i.e., tonal resolution) is getting more attention thanbefore.

1

1. INTRODUCTION

1.2 Quantigraphic HDR

The history of high dynamic range digital photography goes back almost two decades, asRobertson et al. state:

“The first report of digitally combining multiple pictures of the same scene toimprove dynamic range appears to be Mann”[36].

The mathematical framework of digital HDR imaging is firstly published in 1993 by Mannand Picard [26], which is later summarized by Mann as “Quantigraphic Image Processing”[21].To produce HDR photography [19, 41] and video [14, 17, 24, 39] in real-time using FPGA,the prior work of real-time compositing [30] takes raw video input that was produced bya customized camera with alternating exposures in rapid succession. The theory of quanti-graphic image processing, originated in the field of wearable computing to address the dy-namic range limitation found on sensors of wearable cameras, has evolved into a large fieldthat impacts on many other different areas artistically and scientifically. As a result, extensiveamount of literatures appeared over the past two decades.

The term “HDR” now is overloaded with all methods that produce images that appear tohave improved tonal resolutions. To make the term more clear, The term “HDR” used in thisthesis shall mean the theory of quantigraphic image processing, which will be introducedshortly.

When a sensor is exposed to light with a certain spectral distribution, the resulting mea-surable value registered by a sensor is a function of the incident energy reaching it. It corre-sponds to the integration of the electromagnetic energy flux both in time and in a region ofspace. In quantigraphic image processing, this value is called photoquantity. QuantigraphicHDR produces HDR images by combining photoquantities estimated from differently ex-posed images of the same subject matter.

One thing to note is that, in the literature, the descriptions of this intensity value of-ten come mistakenly in as radiometric quantities or as photometric quantities. Radiometryis used to measure electromagnetic radiation, including visible light, in terms of absolutepower. As a distinct from radiometry, photometry weights the radiant power at each wave-length by a luminosity function that models the brightness sensitivity of human eyes. Digitalsensors have a different response function for brightness. When it is recovered, it allowsus to use digital cameras as measuring devices for visible light. The output, though, is thusweighted by the sensors response and therefore not a radiometric or photometric quantity.

2

1.3 FPGA as an Implementation Platform

The characterization of this nonlinear response is a main topic in quantigraphic imaging,which will be introduced in the later chapters.

1.3 FPGA as an Implementation Platform

Quantigraphic HDR, as a method, is not limited to a specific sensor type. In fact, it isalso beneficial to sensors that already have their architectures designed for slightly extendingtheir dynamic ranges (i.e., multi-linear pixels, companding analog-to-digital converters). Thequantigraphic imaging implementation needs to be general and programmable for differentcamera responses.

As the most popular implementation media for digital circuits, an FPGA offers a muchshorter turnaround time for circuit designers. This makes it suitable for prototyping and ex-ploring architectures of circuits. The existing software methods for extending dynamic rangeof sensors cannot be implemented real-time on portable systems because of the high band-width requirement of the computation. Previous implementations of quantigraphic HDRhave been accomplished by use of high-performance computers equipped a Graphic Pro-cessing Unit (GPU)[30]. Compared to its GPU/CPU counterparts, an FPGA implementationperforms at much higher frame rate with less power [38].

The proposed general flow for FPGA-based quantigraphic HDR imaging is given hereand explained in the later chapters.

1.4 Applications

HDR is essential in imaging applications in which the luminance of a scene is not undercontrol of the user. In some of these applications, the limitation of the dynamic range existsonly with the camera. The dynamic range of these scenes can be properly interpreted byhuman eyes. In some applications, however, the scenes are impossible to be captured byeither cameras or human eyes.

A welding scene contains dynamic range of more than 80dB (i.e., 100 million to 1).Any camera that exposed to this lighting will need to be adjusted to the minimum exposuresetting. Except the tip of the welding arc, the resulting image contains no detail for thebackground. This is similar to what a welder will see behind the darkening shade on theirhelmet. This is one of the most extreme situations for HDR cameras, since the scene dynamicrange goes beyond that of the human eye and the camera. Welding stations are located at

3

1. INTRODUCTION

WYCKOFF SET

OF SIZE NCRF RECOVERY

HDL CODER

CRF

HDL FPGA CAD

LUT CODER

1D/2D LUT

BITFILE

Figure 1.1: Overview of HDRchitecture Flow - The general flow of HDRchitecture. Given a Wyckoffset of size N and user constraints, the bitfile is generated for programming the FPGA.

Figure 1.2: Application Example of HDR Used on Welding - This image shows a real-time HDR videostream of the scene augmented with assistant information about welding status and position from computervision algorithms [17]

4

1.4 Applications

a fixed distance from the user, the resulting HDR images captured by a wearable cameramounted on a helmet can be streamed to a stereo display worn by the user, without changesin the focal length.

Cameras have also been included as an essential part for intelligent driver-assist technol-ogy. A live video stream is captured by the camera and fed into the system for image pro-cessing and computer vision. For example, the lane departure warning system is designed towarn a driver when the vehicle begins to move out of its lane on freeway. More commonlyseen on a car these days is a panoramic camera set that shows live video of the scene behindthe vehicle on the screen with static and dynamic guiding lines that show steering angles anddistances. In all above mentioned scenarios, cameras often risk over- or under- exposing thescene simply because we have no control of the lighting.

Figure 1.3: Application Example of HDR Used on Driver Assistant - This image shows a live HDRvideo feed from a driving scene produced by HDRchitecture prototype. Without proper solution to dy-namic range limitation, video stream captured by car-mount cameras will be over- or under-exposed.

Wearable computing is a prevailing topic in year 2014. As wearables are moving awayfrom the wrist, cameras will play a more important role in the field. Augmented Reality (AR)is changing the way we view the world by overlaying informative graphics in the user’s fieldof view. When simply used as input devices, cameras with greater dynamic range providebetter samples for computer vision algorithms. Moreover, it relates AR with a more generalconcept called Mediated Reality. Computer-mediated reality is able to alter one’s perceptionof reality by adding (augmenting) to, or subtracting (diminishing) from the original scene.In this regime, HDR is an embodiment of mediated reality since it appears that it is addinginformation on the originally dark parts of the scene, and subtracting the light from the over-

5

1. INTRODUCTION

exposed parts in the image. Since it is the user’s visual perception of the environment, whichcan be also treated as an output of a sensor (i.e., our eyes), that is mediated, a real-timeHDR video camera can extend the ability of human eyes by altering a video stream light thatwould have normally reached the user’s eye directly. Therefore, it is feasible to use wearablequantigraphic HDR camera as an seeing aid to enhance the human visual perception.

Veillance applications include both surveillance and sousveillance, for either purpose oflaw enforcement or personal life-logging. A surveillance camera mounted near the trafficlight has to be able to see license plates clearly in front of coming vehicles, even at nighttime when the headlight of the vehicle is very bright. During daytime, however, buildingsthat surrounds the camera might cast shadows on one of the streets being monitored at theintersection. Without enough dynamic range, the camera can only see the either “brighter”or the “darker” street at one time, depending on the current set exposure. The scenarios forsousveillance is similar to that of wearable camera. When used for life-logging, a greaterdynamic range provides more information for the user. When used for proof of alibi, HDRmakes it computationally harder to forge because normally it captures much more details onreflective surfaces.

Figure 1.4: Application Example of HDR Used on Abakography - This image shows the use of alight-painting user interface. When combined with HDR, computational light-painting are manipulated asa new form of data entry.

HDR can be used to improve the displaying of computational light-painting results madeby “Abakography”. As a new field explored recently by Mann [29], abakography uses lightitself as user interface. By seeing a 3D abakograph in augmented-reality glasses it is possible

6

1.5 Thesis Organization

to either draw an abakograph tunnel or corridor, or simply experience a predefined tunnel bywalking through it. Comparametric image processing are used to quanti-metrically combinethe abakograph with features of the human subject and external scene. Abakograph canalso be used to visualize veillance or electromagnetic waves. Preliminary experiments showpromise in enabling the creating of aesthetically pleasing and sensuous forms with ease,which in many ways maintain the organic feel of traditional sculpture, while providing theease of experimentation of digital and computational photography.

1.5 Thesis Organization

The work discussed in this thesis is an FPGA implementation of a real-time HDR imagingsystem. The next chapter (Chapter 2) provides background information on the mathematicalframework of the quantigraphic HDR. Background materials of HDR video capturing anddisplaying techniques are also given. Chapter 3 introduces the implementation at the top-level and gives details on the prototyping methods used. The entire flow of HDR video datais given in this chapter. The connections between modules are provided.

Chapter 4 to 7 provide detailed information on the implementation of each modules in-troduced in Chapter 3. They address the difficulties encountered and investigate the possibledesign decision for solutions. The results of the implementations are given at the end of eachchapter. Chapter 4 discusses the design of the HDR frame buffer that prepares the raw inputvideo for composition. Chapter 5 compares several implementation of traditional composi-tion method called weighted-sum. Chapter 6 makes extensive use of multiplexers in orderto achieve two-dimensional table lookup for implementing another statistical compositingmethod. In Chapter 7, the tonal mapping algorithm is implemented to post-process the HDRresult for commercial displays. The final chapter (Chapter 8) summarizes the conclusion forthis thesis and provide future works for further investigation.

7

Chapter 2

Background and Previous Work

2.1 Quantigraphic Image Processing

Quantigraphic HDR differs from other HDR methods in the following characteristics:

• It combines several differently exposed digital images of the same subject matter toobtain a single image with extended dynamic range, and improved color fidelity.• It requires no knowledge of the response function of the imaging device. In the other

words, it recovers the response function from the set of image in a self-calibrationprocess and then maps each pixel into a corresponding photoquantity prior to the com-position.

To introduce the concept of quantigraphic image processing, as a means of extending dy-namic range of digital images, let’s start by modeling an image. Images produced by moderndigital cameras consists of pixel values f(x) that contain tonal information for different colorchannels. It also has a domain x = (x, y) that are spatial coordinates of each pixel values. Letus then consider a set of N images of the same subject matter, indexed by i ∈ {1, . . . , N},with each image differing only in their relative exposure ratio ki. Thus, a set of functions,

fi(x) = f(kiq(x)), (2.1)

where ki are scalar constants for each photoquantity q(x), is known as a Wyckoff Set [20,27]. Thus for each images in a Wyckoff set, we have fi(x) representing the pixel at locationx = (x, y) of the image with the ith exposure.

To model a capture process of color images with digital cameras, we need to understandhow camera works. Each element of a photosensitive sensor array has a spectral responsefunction, s(λ). The emitted light from the scene can be characterized by its spectral dis-tribution, which maps each wavelength to a measurement of the associated radiant energy.When the sensor is exposed to light from the scene with spectral distribution, c(λ), the sensor

8

2.1 Quantigraphic Image Processing

measures the quantity of light over this spectral response as:

q =

∫λ

c(λ)s(λ)dλ. (2.2)

Thus a measurable digital value q, called the photoquantigraphic quantity (or photo-quantity for short), is given as the result of the exposure registered by the sensor element.Photoquantity is the measurement of quantity of light integrated over the spectral responseof the sensor element on a particular camera system. The earlier notion from either radiom-etry or photometry, used in some HDR literatures, is not used in our discussion to avoidconfusion. Photometric terms are simply radiometric terms weighted by the human visualsystem spectral response function, and this response is not necessarily the same as the cam-era. Photoquantity is weighted by the sensor response thus can accurately contains all theinformation we need to recover a mapping that is linear-responsive to the quantity of lightarrived at the sensor. Thus we are essentially turning a camera from a sensing device to ameasuring device.

Quantigraphic HDR composition is to combine photoquantities qi(x) from each differ-ently exposed image to get an estimate q(x) of the scene. Pixel values fi(x) cannot be usedfor composition because they are not linearly related to the amount of light. To linearize dataand perform a meaningful comparison between differently exposed fi(x) values, we needto know how digital camera maps photoquantity values to the corresponding pixels. Themapping f(qi(x)) here is called Camera Response Function. Thus we model the mappingmathematically as,

f(x) = f(q(x)). (2.3)

The digital cameras, within a single exposure setting, cannot map all values of the photo-quantity from the scene that is too bright or too dark. This is exactly the problem of limiteddynamic range that we are trying to address. The single-exposure response function modeltaking this into account is given as follows [35]:

p(x) =

0 if q(x) ∈ [0, Q0]

f(q(x)) if q(x) ∈ (Qm−1, Qm],m = 1, ,254

255 if q(x) ∈ [Q255,∞),

(2.4)

where Qm values are the different values that each represents the amount of light that isactually falling on the sensor. By taking images with different exposures, we can estimate

9

2. BACKGROUND AND PREVIOUS WORK

photoquantity values that are outside the response range of any single-exposed capture, thusextending the dynamic range of the camera to its Dynamage Range [30]. The dynamagerange is defined as follows:

Dynamage range is the ratio between the largest quantity that will not damagea sensor or device or receiver, and the smallest non-negative quantity for whichchanges in the quantity remain discernible.

The empirical assumption of the camera response function being monotonically increasinggives us a well-defined inverse function. It makes computation from fi(x) to qi(x) possible:

qi(x) = f−1(fi(x)). (2.5)

For images in an Wyckoff set, the exposure separation are kept constant. The pixels con-tained in the ith images,

fi(x) = f(kiqi(x)), (2.6)

can be used to estimate the photoquantity by shifting it according to the exposure value tothe reference exposure using the following inverse mapping,

qi(x) =1

kif−1(fi(x)), (2.7)

where qi(x) means the estimate of qi from the corresponding pixel in the ith exposed image.By using the inverse camera response function, photoquantities from the multiple differentlyexposed images can be combined to provide a single estimate of photoquantity q(x) that canthen be turned into an image of greater dynamic range.

2.2 Video Source

The capture techniques for quantigraphic HDR video applications can usually be achievedin three ways. The different exposures can be arranged temporally, spatially, or optically.Temporal exposure change is the most common way, which consists of differently exposedframes of the same subject matter. Thus the frame exposure changes at every frame with anarithmetic modulo number, which is the number of input image that is going to be used forcomposition. Spatial exposure change happens within one frame, with different exposuresarranged pixel-by-pixel. Theoretically, this can be achieved with optical mask with densitiesvarying per pixel or with new sensors that has different light sensitivity for different pixels.

10

2.3 Camera Response Function Recovery

Alternatively, optically varying exposure means that multiple cameras that is synchronizedand arranged with beam splitter can capture the same scene with different exposure settings.This method has the best video quality with least amount of artifacts, but it is more expensiveto set up compared to the other two methods. These three techniques of quantigraphic HDRcapturing are summarized as follows:

1. Temporal Exposure Change

• contains consecutively differently exposed frames,• requires high frame rate to reduce motion artifacts,• needs window-sliding buffer to keep the same frame rate as the input, and• requires higher sensitivity for short capturing time, which increase noise.

2. Spatial Exposure Change

• changes pixel exposure within a single frame to remove motion artifact,• is usually achieved using sensor mask or low-level sensor control,• contains different regular or irregular patterns, and• composites a pixel window to an HDR pixel at the cost of reduced resolution.

3. Optical Exposure Change

• captures several exposures at once using multiple cameras,• directs the light from the lens to more than one imaging sensor,• is costly, but no motion artifacts, and• splits the light so that an increased sensitivity of the sensors is required.


To recover photoquantity, we need to know how a camera converts it to a pixel based on thecamera response function. The following methods are quantigraphic, by which a camera re-sponse function is estimated. The result inverse response is used to linearize the pixel valuesas a anti-homographic filter. These methods require no priori restrictions on the subject; andthere is no knowledge of the scene. The exposures of different images and their pixel valuesare known and given to the camera response recovery algorithms [37]. These methods aredifferent from the non-quantigraphic ones, which involve collecting pairs (qi(x), fi(x)) ofdata by capturing pixel data from scenes that are setup with pre-recorded intensities. Theaccuracy of the calibration procedure depends heavily on the scene setup to be very wellcontrolled. These methods are used only by camera manufacturers [6].

11


2.3.1 Mann and Picard, 1993

The method developed by Mann in his 1993 paper [20] and later with Picard in their 1995papers [26] introduced the main ideas of generating HDR images by combining multipleimages from a recovered digital camera response function and refined later in [18, 21, 23, 25].They are regarded as the seed papers for the area of quantigraphic HDR and laid the groundfor the subsequent research. The response curve was modeled by:

f(q(x)) = α + βq(x)γ, (2.8)

based on density curve of photographic film. The parameter α, representing the minimumdensity, can be estimated, by taking an image with the lens covered and then subtracted fromthe other images. The model is solved by using comparametric euqations based on the propo-sition given in the paper [21]. When a function f(q(x)) is monotonic, the comparametricplot (f(q(x)), f(kiq(x))) can be expressed as a monotonic function g(f) not involving q.The response curve is assumed monotonically increasing. As mentioned by the author, theonly practical situation that would likely to violate this assumption is where the rays of lightare concentrated on the film or sensor for a sufficiently long time to burn a hole through.The algorithm assumes a pixel-wise correspondence of scene points to exploit the exposurevariation between images through means of “comparagrams”.

2.3.2 Debevec and Malik, 1997

The method proposed by Debevec and Malik in their SIGGRAPH 1997 paper attracted a lotof attention in the computer graphic research community [7]. It gives a simple solution toestimate the response function with linear optimization. The process takes in digital values atone pixel for several different known exposures. These pixel-exposure pairs are then plottedon a graph to generate smooth curve fragments. Since the shapes of the response function atdifferent parts of the curve is known, the problem becomes how these fragments fit together.They solved the problem using linear optimization that finds a smooth curve that minimizedthe mean-squared error over the derived response function. The objective function is asfollows,

O =∑i

∑x

[w(fi(x))

(lnf−1(fi(x))− lnq(x)− ln∆ti

)]2(2.9)

Using singular-value decomposition, they solve the equations separately for each of the RGBchannels, assuming that interactions between the channels can be neglected.

12


2.3.3 Mitsunaga and Nayar, 1999

A polynomial approximation to the response function was used in the paper of Mitsunagaand Nayar [31]. What worth mention for their method is that the camera response functionsis recovered without knowing exact exposure ratios between each pair of the input image. Infact, together with the response, this ratio is estimated as an output of the algorithm. Theyuse the following M -dimensional polynomial as the response function model.

f(fi(x)) =M∑m=0

cmfi(x)m (2.10)

Thus to calibrate means to determine all the coefficients cm and the orderM . They minimizethe following error function for an image set of size N :

O =∑i

∑x

[M∑m=0

cmfi(x)m − kiM∑m=0

cmfi(x)m,

]2(2.11)

where, the exposure ratio of two consecutive images is assumed to be kj . Since the minimumis found when the partial derivatives with respect to the coefficients are all set to zero. Thesystem of linear equations is then:

∂O

∂cm= 0 (2.12)

2.3.4 Robertson, Borman, and Stevenson, 1999

The method proposed in Robertson, Borman and Stevenson in their 1999 paper estimates thecamera response function with an iterative optimization process [35]. At each iteration, abetter approximation is computed based on the previous estimation. The objective functionto be minimized is:

O =∑i

∑x

[wfi(x)

(Ii(x)− qi(x)xi

)]2. (2.13)

This equation was minimized by setting the gradient∇O(qi(x)) equal to zero. This gives:

ˆqi(x) =

∑iw(fi(x))kiIi(x)∑

xw(fi(x))k2i. (2.14)

The certainty function, or well-exposedness, used in this method is based on the confidenceof the observed data. The response of the camera is usually most accurate, or most sensitive,

13


towards the middle of the output range. The weighting function chosen by the author then isa Gaussian-like function:

wi(x) = w(fi(x)) = exp

[− 4 · (fi(x)− 127.5)2

127.52

]. (2.15)

Similar weighting function based on pixel values is used in Chapters 5 for improving thecircuit speed and area.

2.3.5 Grossberg and Nayar, 2003

In their 2003 paper [10], Grosberg and Nayar relates pixel values from two differently ex-posed image of the same subject matter using comparagram, a joint-histogram method cre-ated by Mann and Picard. The result is a technique to more robustly estimate the cameraresponse with histogram-based method. The theorem, named Intensity Mapping Function

Properties, derived in this paper, proves that a comparagram is sufficient to estimate thecamera response function. Intensity mapping function, τ , relates the values in one imageinto the other image, i.e., f1 = τ(f2). It behaves very much like a comparametric equation[22]. Suppose we have, fi(x) = f(qi(x)) as before, and for the two photoquantity q1 and q2in the scene, q1 = k · q2. The intensity mapping function, τ , can be derived as,

f1 = f(kq2) = f(k · f−1(f2)

)= τ(f2). (2.16)

In this paper, they also proved that the method can be accomplished without image beingperfectly registered. Intuitively, the reason is comparagram, which is sufficient to recoverthe camera response, does not carry any spatial information of the scene. Whenever thehistogram remains roughly constant, the histograms can be used to recover a camera responsefrom image sequences with both motion of the camera and object in the scene.

2.4 Comparametric Camera Response Function

The previously proposed methods recover the camera response function and use it to esti-mates the combined photoquantity arithmetically using weighted-sum approach. More im-portantly, when transforming each pixel fi(x), mapping is operated independently. Takinginto account the relation between images of different exposures give rise to the more sophisti-cated methods, such as the statistical approach proposed in [33] and the per-pixel non-linear

14

2.4 Comparametric Camera Response Function

optimization method in [9]. These methods are computationally complex and hard to bedirectly implemented on FPGA for real-time system.

Instead of estimating qi(x) independently from each exposures and then combine themwith weighted-sum composition, the proposed method constructed pairwise joint estimatorF that takes pixels from two exposures. These estimators are previously implemented onGPU as LUT which are called Comparametric Camera Response Function (CCRF).

2.4.1 CCRF Composition

An CCRF lookup table is created by sampling through every combination of possible twotonal ranges. The size of CCRF defined in this thesis was 1024 × 1024. Thus results arecalculated for all of the combinations (q1, q2), with each of them contains 1024 tonal values.Non-linear optimization are used to maximize the probability of q = q, given the observa-tions q1 and q2,

P (q = q|q1, q2) =P (q)P (q1|q)P (q2|q)

P (q1, q2)(2.17)

∝ P (q = q)P (q1|q)P (q2|q). (2.18)

Based on the Levenberg-Marquardt algorithm, the optimized value of q can then be foundaccording to the following equation,

q = arg minq

[(q1 − f(q)2)

σ21

+(q2 − f(kq)2)

σ22

], (2.19)

where k is the exposure ratio and σ is estimated based on the interquartile range of thecomparagram. Once the CCRF is constructed, we can use it to composite any numbers ofLDR images. Let’s first consider an Wyckoff Set with two exposures, q(x) is then given by,

q(x) = F (q1(x), q2(x)). (2.20)

With a pairwise topology, this composition method can be generalized to be applied onWyckoff Set of size N . The same LUT can be used in parallel because the exposure ratiosare kept constant at the input and at every output stage. The example of compositing 3

exposures considered. In this case, the equation becomes,

q(x) = F (F (q1(x), q2(x)), F (q2(x), q3(x))). (2.21)

15


2.5 Spatial Tonal Mapping

Once q(x) is estimated from the image set, we need to perform dynamic range compressionfor LDR display. The following function proposed in [30] can provide adequate dynamicrange compression and works well for general high dynamic range scenes:

Qi = r · q1/k + d, (2.22)

where r, k, and d can be used to adjust the contrast and brightness. Another approach isproposed by Gastal et al, [8] in their SIGGRAPH paper uses an edge-preserving method fortonal mapping. Assuming that the number of independent channels c, in the input image isthree.

I ′cx =dIcdx

= I(x+ h)− I(x), I ′cy =dIcdy

= I(y + h)− I(y) (2.23)

where, h = 1. Partial finite differences are taken to compute the domain transform. Thedomain transforms, ct′x and ct′y, are calculated as:

ct′x(u) = 1 +σsσr

c∑i=1

|I ′cx|, ct′y(u) = 1 +σsσr

c∑i=1

|I ′cy| (2.24)

where ct′x(u), ct′y(u), contain the information that proportionally describes the differencein pixel values between adjacent pixels in 1-dimensional space. It is important to processall channels of an image at once for edge-preserving filtering. Processing each channelindependently can introduce artifacts near the edges.

The input image is then one-dimensionally convolved twice with the filter kernels Vx(u)

and Vy(u). The horizontal convolution is performed first on the input image with Vx(u). Theresult, Fcx, is then vertically convolved with Vy(u) to obtain the output image.

Vx(u) = exp[(−√

2/σs)(ct′x(u))

], Vy(u) = exp

[(−√

2/σs)(ct′y(u))

](2.25)

The effect of the filter is scaled inversely proportional to the domain transform, ct′(u).Namely, if pixel u has a large ct′(u) value, an edge occurs at u. Then the filtering shouldhave little effect on pixel u, hence preserving the edges.

Fcx(u) =

∫ ∞−∞

Ic(x)Vx(u− x)dx, Fc(u) =

∫ ∞−∞

Fcx(y)Vy(u− y)dy (2.26)

16

2.6 Summary

The actually implementation of the tonal mapping algorithm will be discussed in Chapter7. The concept of edge-preserving is applied using a Gaussian convolution. High-frequencyinformation of the compressed quantigraphic map is added back to enhance edge of theimage.

2.6 Summary

The flow quantigraphic image processing is summarized using a patent proposed by Mann,shown in Fig 2.1. The imaging process shown in the right part of the figure is implementedin the work that will be discussed in Chapter 5.

Figure 2.1: Mann’s 1993 Patent on Quantigraphic HDR Processing - This patent summarizes the orig-inal HDR imaging flow which is also implemented in HDRchitecture as one of the compositing methods.

17

Chapter 3

HDRchitecture

HDRchitecture is a parameterizable FPGA implementation for real-time quantigraphic HDRvideo processing. Its hardware architecture is determined by the key process stages that arecommonly found in HDR photography. The top-level diagram of the quantigraphic HDRhardware architecture is given in the following diagram:

...

VIDEOSOURCE

FRAMEBUFFER

EVn

EV3

EV2

EV1

QUANTI-GRAPHIC

COMPOSITION

SPATIALTONAL

MAPPING

DISPLAYCONTROL

LDRDISPLAY

...

...

Figure 3.1: Top-Level Block Diagram of HDRchitecture - This figure shows the major function blocksof HDRchitecture based on the essential stages typically found in a HDR photography or cinematography.

3.1 Capturing Device

The video capture device used for HDRchitecture is a Canon EOS 60D DSLR camera. Thefirmware of the device is modified such that exposures can be alternated according to thevideo frame number. The maximum frame rate is 60 FPS, thus a small amount of ghosteffect will be seen due to motion blur. To quantigraphically combine differently exposedimages of the same subject matter, video frames with alternated exposure must be capturedwith imaging devices. The most common method to capture input for HDR processing is

18

3.2 Frame Exposure Setting

using exposure bracketing, as shown in the Figure 3.2, by changing the exposure setting foreach frame. In the example shown, three different settings are used to achieve the exposurebracketing on the input video stream. The exposure separation between any two imagesis kept constant. In the case of odd bracketing size, the photo in middle is assumed to beproperly exposed and its exposure is used as reference. If the bracketing size is even, themiddle two frames are slightly under- and over-exposed from the reference exposure.

EV1 EV2 EV3 EV1 EV2 EV3

TIME

FRAME 1 FRAME 2 FRAME 3 FRAME 4 FRAME 5 FRAME 6

INPUT VIDEO STREAM...

...

Figure 3.2: Raw Video Containing Temporally Varying Exposures - This figure illustrates an exampleof a video that contains a three-exposure Wyckoff Set that varies the exposure for every frame.

3.2 Frame Exposure Setting

Exposure settings of a camera contain aperture, shutter, and film speed (i.e., ISO). Thus,when taking a photo, one can therefore use many combinations of the above three settingsto achieve the same exposure. For HDR video, knowing the trade-offs of each setting isimportant. Aperture controls the area over which light enters the camera. It also determinesthe photo’s depth of field, which is the range of distance over which object remains in sharpfocus. For real-time quantigraphic HDR applications, changing aperture setting for eachframe is not a feasible option since aperture settings on the testing camera are achieved withmechanical design that has overlapping blades. Shutter speed controls the duration of theexposure. Thus, it affects the motion blur in the frame. To have a sharp image of a movingobject, faster shutter speed is preferred. In order to maintain the frame rate of the video,electronic shutters are used in video recording mode. Depending on the frame rate, thesettable shutter speed range is 1/4000 sec to 1/60 sec. Together with ISO speed, this rangedetermines the usable dynamage range of the camera for exposure bracketing. ISO speed, onthe other hand, controls the sensitivity of camera’s sensor to a given amount of light. Sincehigher ISO increases the image noise, a lower ISO is always desirable if exposure level canbe obtained using the other two settings. The settable range of ISO is 100 to 6400. Thus,

19

3. HDRCHITECTURE

when implementing video bracketing on firmware modification, higher ISO values are usedwhen the shutter speed is unable to be maintained within the frame rate.

3.3 HDR Frame Buffer

The frame buffer of HDRchitecture serves as an interface between the video input and thecomposition circuitry. Its functionality is to prepare video streams for compositing. Morespecifically, it splits the input video, which contains the N temporally alternated exposedframes, into N video streams that each contain a single exposure setting. The frame bufferneeds to be generated using software according to the number of exposures we want to havein the input stream. Thus the design of frame buffer must be modularized into parameteriz-able circuit implementations, each having its own video read/write paths.

VIDEO INTERFACE 1 AXI4 R/W CTRL 1

AXI4

INTER-

CONNECT

... ...

VIDEO INTERFACE 2

VIDEO INTERFACE 3

VIDEO INTERFACE i

AXI4 R/W CTRL 2

AXI4 R/W CTRL 3

AXI4 R/W CTRL i

VIDEO INPUT

P1

P3

P2

Pi

...

DDR3

CTRL

Figure 3.3: Top-Level Block Diagram of HDR Frame Buffer - This block diagram shows the HDRframe buffer that is used in HDRchitecture to prepare video streams for composition.

As shown in Figure 3.3, the frame buffer consists of four parts. The video interface isused to handle the clock domain crossing issue between the video data running on the pixelclock domain and the memory clock domain. It also generates the burst data and correspond-ing memory read/write commands, based on an Finite State Machine (FSM) that monitorsread/write operation status. The read/write controller arbitrates read/write for a single videochannel and also translates the memory commands according requirements specified in Ad-

vanced Microcontroller Bus (AMBAr) 4 Advanced eXtensible Interface (AXI4) Protocol[43].A video interface, together with an AXI4 read/write controller, is the minimum single-portcircuitry to run AXI4 compatible DDR3 memory controller core provided by Xilinx. Thus,when several ports are presented, an AXI4 Interconnect core is used in a N-to-1 configura-

20

3.4 HDR Composition

tion to provide further arbitration among all master ports. The DDR3 memory interface corefrom Xilinx is then connected to the AXI4 interconnect as a slave device.

3.4 HDR Composition

Previous chapters discussed several ways to recover the camera response curve from a Wyck-off set, let’s discuss how it can be used on an FPGA for HDR video composition after beingrecovered. The camera response is monotonically increasing, thus its inverse mapping iswell defined. HDRchitecture implements two methods for compositing HDR. The weighted-sum approach focuses on the resources, uses one-dimensional lookup tables to estimate thediscrete inverse camera response function. Another method called quadtree comparametricapproach, aiming for higher video quality, uses more resource to estimate result for the en-tire composition process statistically with two-dimensional lookup tables. The compositingarchitectures of each method are discussed here. The details of the improvement are thengiven in later Chapters.

WEIGHTED-SUM

1D LUT ARITH

f1(x)

...q(x)

QUADTREE CCRF

1D LUT 2D LUT...

f2(x)

fi(x)

f1(x)

f2(x)

fi(x)

q(x)

Figure 3.4: The General Composition Architectures - This figure illustrates the conceptual architecturefor composition based on two methods.

Figure 3.4 shows the two architectures of composition circuits that used the two methods(i.e., weighted-sum and quadtree CCRF) introduced earlier. The weighted-sum approachwill have all the values of inversed camera response function stored in lookup tables andused these tables to convert pixel values into quantigraphic light space. Then according tothe certainty/weight given, it uses the implemented arithmetic circuit to calculates estimationof the photoquantity using weighted sum. Improvement of lookup table implementations canbe done to speed up the estimation and to reduce the resource. Weight-sum approach willbe targeting the small-sized FPGA thus a quick efficient architecture is required. On theother hand, quadtree CCRF approach needs more resource to implement and gives better

21

3. HDRCHITECTURE

estimation on the resulting photoquantity. It firstly converts pixel values into light spaceusing the same inverse camera response function, implemented using 1D lookup tables. Thenit takes photoquantity values and composites them using 2D lookup table. These 2D tables,however, need to be compressed because there is not enough on-chip memory resource tostore the entire table on FPGA. Quadtree compression is performed on the raw table byexamine the sensitivities of CCRF for each input combination.

As a result, a 2D interpolating memory is implemented to achieve the non-uniformlookup operation. This architecture is a table-based 2-dimensional function evaluator, ac-cording to the definition given in [34]. In modern FPGAs, on-chip block memory makestable-based arithmetic practical in many application. It reduces the hardware developmentcost by pre-computing all the results using software programs and improves the system reli-ability since the results are usually stored in read-only memories.

Figure 3.5: Example of the Four-Up Display - This figure shows an three-exposure input on the four-upwindow display, with the tone-mapped HDR result at the bottom right corner for comparison.

3.5 Spatial Tonal Mapping

In order to properly display HDR frames on commercial 8-bit displays, tone mapping isneeded to map the colors of the photoquantities to 8-bit colors in order to approximate theappearance of the composited image. In the case of viewing HDR, appropriate compres-sion and edge-enhancement is required to preserve the information after a strong contrastreduction. HDRchitecture implements tonal mappings method introduced in [8] using bothglobal and local operations. It uses inverse-power function as global tone mapping opera-tor to compress the estimated photoquantity values and then weight averaging the colors of

22

3.6 Summary

neighbor pixels based on their distances in image space to produce convolved images layers.A normalized binary Gaussian kernel is used to compute the result. Edge enhancement isperformed using the convolved image layer to preserve the details of the final image. Theoriginal software implementation iterates filter operation and empirically analyze the con-vergence of the process. To compare the HDR result with the original exposures, a displaycontroller (called the Four-Up Display) is implemented in the prototype for three-exposureinput stream as shown in Figure 3.5.

3.6 Summary

To scale the system to support larger numbers of exposures as input to HDR production,issues of several sub-systems in the proposed architecture is discussed below. Let’s assumethat the number of exposures we want to combine in the video stream is N .

CameraThe limitation comes from the camera is the dynamage range, which is the ratio be-tween the largest quantity that will not damage the sensor or receiver, and the small-est non-negative quantity for which changes in the quantity remain discernible. Withsmaller incremental exposure difference, i.e., 1 stop, maximum number of exposurecan be captured by the camera is more than 10. For the prototype implemented, threeexposures with five stops of separation is used.

Frame BufferBandwidth of the memory is the concern here. The frame buffer reads in the inputstream and stores the N streams in different DDR3 memory locations for paralleloutput. Each of the N parallel video streams are using Read/Write port with doublebuffering mechanism for stable video output. Thus an arbiter is used to coordinate the2N + 1 ports for each of them to access memory. Based on calculation performed inearlier publication [38], the maximum number of port supported was 8.

Compositing CircuitResource usage is the main thing to consider in this circuit. For weighted-sum, de-pending on the different size of the Wyckoff set, different circuit needs to be manuallycoded. Resource usage of weighted sum is given in Chapters 5. There are 3N(N−1)/2

compositing instances for quadtree, grouped in the pairwise structure. Resource ofeach table is given in Chapter 6

23

Chapter 4

HDR Frame Buffer

One common mistake when processing video stream with temporally varying exposures is totake every N consecutive frames and composite them into HDR frame. As shown in Figure4.1, the video input that has three temporally alternating exposure and with frame number 1,2, 3, etc.. Frame 1 and 2 are stored into memory and then read out in parallel with Frame 3 tobe used as the input Set 1 for the composition block. Same mechanism takes Frame 4, 5, and6, as Set 2, and so on. Although very intuitive and straightforward, this reduces the resultingframe rate by a factor of N . In order to maintain frame rate, a bracketing sliding mechanismis used to produce N separated video streams with a single exposure setting. Figure 4.2gives an example illustrating the operation performed in a 3-exposure video stream. In thisexample each video frame is stored and repeated according to the exposure setting, in asimilar way as if an exposure bracket is sliding along the original input video stream. Ingeneral, for an input video stream that contains N temporally varying exposures, each framewill be stored into the corresponding memory location and get updated as soon as the newframe of the same exposure settings arrives. Each frame therefore will get repeated in theresulting video stream N times.

INPUT 1 2 3 4 5

SET 1 SET 2

6 ...

TIME

Figure 4.1: A Common HDR Video Buffering Mistake - This figure illustrates the inefficient videobracketing method used often on video input with temporally varying exposures.

The bracket sliding functionality is provided by the frame buffer. As illustrated in Figure4.2, it splits the input video, which contains the N temporally alternated exposed frames,into N video streams that contains one of the N exposure settings. The frame buffer needs

24

to be generated using software according to the number of exposures we want to have inthe input stream. Thus the design of frame buffer must be modularized into parameterizablecircuit implementations, each having its own video input and output.

INPUT 1 2 3 4 5

1 1 1 4 4

22 2 5

3 3 3

MEM 0

MEM 1

MEM2

STORED...

STORED...

STORED...

UPDATED...

UPDATED...

SET 1 SET 2 SET 3

SET 1

SET 2

SET 3

Figure 4.2: Example - Bracket Sliding Operation - This figure illustrates bracket sliding operation on avideo stream with three temporally varying exposures.

As shown previously in Figure 3.3, the frame buffer is built according to an typical multi-port memory controller architecture. The core can be a hard memory controller, which isavailable in one of the Xilinx FPGA called Spartanr-6 [42]. It can also be implemented us-ing soft memory interface Intellectual Property (IP) core provided by Xilinx called 7-seriesmemory interface solution/generator (MIG) [44]. Xilinxr has adopted the Advanced Mi-

crocontroller Bus (AMBAr) 4 Advanced eXtensible Interface (AXI4) Protocol for all theirIP cores [43], including the 7-series MIG. The 7-series MIG interfaces the 7 series FPGAuser design and AXI4 slave interfaces to DDR3 devices on the board used by HDRchitecture

[44]. Thus the design of HDR frame buffer relies on working with the same protocol used byXilinxr. The previously used Processor Local Bus (PLB) v4.6 specification is a bus spec-

25

4. HDR FRAME BUFFER

ification that regularize the shared access between processor (masters) and its peripherals(slaves). AXI4 is more of an interface specification, which defines a point-to-point mas-ter/slave relationship instead of direction of data flow. Therefore, AXI4 can provide moreflexibility in interface widths and clocking.

MEM CLK

800MHz

SYS CLK

200MHz

HDMI CLK

148.5MHz

VIDEO INTERFACE 1 AXI4 R/W CTRL 1

AXI4

INTER-

CONNECT

... ...

VIDEO INTERFACE 2

VIDEO INTERFACE 3

VIDEO INTERFACE i

AXI4 R/W CTRL 2

AXI4 R/W CTRL 3

AXI4 R/W CTRL i

DDR3

CTRL

Figure 4.3: Clock Domains of HDR Frame Buffer - This figure illustrates the three clock domains usedin the HDR frame buffer.

The setting used for the AXI4 Interconnect is a common degenerateN -to-1 configuration.This configuration occurs when multiple master devices (i.e., video ports) arbitrate for accessto a single slave device (i.e., the MIG, in our case). The data-width and clock rate conversionare also performed in this core. The data width of each slave interface is set to 512 bits,with a round-robin arbitration. Each port has support for both read and write with the sameacceptance.

Beside the HDR frame buffer, all other function units are running in the HDMI clockdomain. For videos running at 1080p, the pixel rate is 148.5MHz, which can be calculatedaccording to the Video Electronics Standards Association (VESA) specification [40]. Tocalculate the timing parameters, VESA has provided an Excel spreadsheet, which is availablefrom their public site [40]. Most of the component in HDR frame buffer runs at systemclock of 200MHz. Thus, upon entering, the video signals needs to be synced according tothe system clock. At the end of frame buffer, data are sent into the four DDR3 SDRAMspresented on the board. The MIG core takes care of the conversion between the systemclock and the DDR3 clock. To summarize, Figure 4.3 shows the different clock domains inthe top-level view of the HDR frame buffer.

26

4.1 Video Interface

In the implementation of the video interface and AXI4 read/write controller, the memoryread/write paths are grouped together in each module, as shown in Figure 4.4. In this way,each module can be added when different size of bracketing is needed.

VIDEO INTERFACE AXI4 R/W CTRL 1

MEMORY WRITE PATH

MEMORY READ PATH

VIDEO

INPUT

VIDEO

OUTPUT

TO

MEMORY

FROM

MEMORY

Figure 4.4: Frame Buffer Read/Write Paths - This figure illustrates the read/write paths grouped invideo interface and AXI4 read/write controller.

The following sections discuss the details in each module. In what follows, only thememory write path is discussed in detail. The read path contains the Read address and Readdata channels. These signals are almost identical to the write path, except in write path thereis an additional write response channel.

4.1 Video Interface

The video interface of HDRchitecture serves as the basic functional block to handle clockrate conversion and to form data groups for burst mode. It has similar circuitries for bothread/write directions. 4.5 shows the path from video to memory.

Video Interface SignalsPort Direction Description

SYNC input Video sync signals: VS, HS, and DE.WSEL output Selects between line buffer 1 and 2 for write operation.WEN output Set line buffer Port A to be write mode pixel data arrive.WADDR output Increments address at the end of every line. It gets reset at the end

of frame.

* CLK input Clock inputs for the circuit.EN * input Enable signals used to choose between line buffers.

27

4. HDR FRAME BUFFER

WE * input Write enable of Port B is connected to constant zero.ADDR * input Addresses of data is specified based on data widthDIN * input The pixel data is coming from the camera.DOUT * output The grouped video data is streamed in burst to memory.

RSEL output Same as WSEL except for reading. It enables the one that is notcurrently being written.

REN output Read enable is controlled by FSM in the burst controllerRADDR output Read addresses is given by the burst controller.

Table 4.1: Video Interface Signals - This table describes the signals in the video interface presented inFigure 4.5.

The details of the signals are listed in Table 4.1. It consists of an address generator, adouble line buffers and a memory burst controllers. The double line buffers are implementedusing the true dual-port block RAM resource. It has two completely independent accessports, A and B. Thus a main functionality it provides is to handle the clock domain crossingbetween the HDMI clock running at 148.5 MHz (for Full HD) and the system clock runningat 200 MHz. It also has different read/write widths for forming data into large burst group.The read width of the line buffer is configured to be 32 to hold the RGB pixel data. Thewrite width is configured to be 512 (i.e., 16 pixels), to be used in the burst mode. In the caseof video buffering, Port A of the line buffer always performs write operations, while PortB always performs reads. Both read/write operation are single clock-edge operation. Theaddress is registered on the port, and then the data is load (for read) or stored (for write). Foreven greater stability, the mechanism of double-line buffering ensures the block RAM onlyperform one operation at the time.

4.1.1 Address Generator

The address generator controls the data flow from video streams into the double line buffer.It registers several versions of delayed sync signals to be used. Let’s use Si as the output ofi chained register when a sync signal S is the input of the first register. Then Si becomesa version of S, with i clock cycles of delay. These video sync signals are used to makethe corresponding sync shot according to their polarities. A sync shot signal is a one-cyclestrobe made from the video sync signals to indicate the arrival of the edge of interest of thesync signal. A positive sync shot SP indicates the arrival of positive edge on signal S; and a

28

4.1 Video Interface

WRITE

ADDR

GEN

LINE

BUFF

1

EN_A

WE_A

ADDR_A

DIN_A

DOUT_A

CLK_A

EN_B

WE_B

ADDR_B

DIN_B

DOUT_B

CLK_B

WSEL

WADDR

WEN

VS

DE

HS

MEM

BURST

CTRL

RSEL

REN

RADDR

HDMI_CLK = 148.5MHz SYS_CLK = 200MHzHDMIR_DAT

MEM

WRITE

SIGNALS

...

LINE

BUFF

2

EN_A

WE_A

ADDR_A

DIN_A

DOUT_A

CLK_A

EN_B

WE_B

ADDR_B

DIN_B

DOUT_B

CLK_B

512

512

32

Figure 4.5: Top-Level Block Diagram of the Video Interface - This figure illustrates the video interfacebased the operation of the double line buffer.

negative sync shot SN indicates the negative edge. They can be made with the following twocircuit, respectively:

SP = S ∧ S1; SN = S ∧ S1. (4.1)

The address generator increments the read/write address on the pixel basis. It starts incre-menting on positive sync shot of data enable, and resets the counter on positive sync ofvertical sync. It also ensures the double line buffer alternation according to positive syncshot of horizontal sync.

4.1.2 Double Line Buffer

The line buffer functions for the clock domain crossing and word-length conversion for eachvideo lines. The goal of doubling the line buffers is to ensure each of the block RAM are freeof read/write conflict. The operation timing is shown in Figure 4.6. Assuming at certain timeLine i in the input stream gets written in Buffer 1, it does not gets read out to the output untildata in the next line (Line i + 1) starts being written into Buffer 2. The read/write mode arealternating between the two buffer based on the video sync signal thus ensuring the stabilityof the block RAM.

The double line buffers consists of two block RAM side by side with either one of themenabled to be writable. In the memory-write path shown in the Figure 4.5, it sits between

29

4. HDR FRAME BUFFER

LINE i LINE i+1 LINE i+2 LINE i+3

LINE i LINE i LINE i+2 LINE i+2

LINE i+1 LINE i+1 LINE i+3 LINE i+3

WRITE

LINE i LINE i+1 LINE i+2 LINE i+3

WRITE

WRITE WRITE

READ READ READ READ

INPUT

BUFFER 1

BUFFER 2

OUTPUTONE LINE

DELAY

...

...

Figure 4.6: Double-Line Buffer Operation - This figure illustrates the double line buffer operation usedin the video interface.

the write-address generator and memory burst controller. The WSEL signal toggles at everyline based on the HS signal. The same video sync is produced with a synchronizer for thesystem clock domain and used to coordinate the memory burst controller. The RSEL signalthus toggles at every line while making sure it is not enabling the one that WSEL is using. Inour case, block RAM is used for only one direction thus WE B signals from the two blockRAMs are connected to ground. DOUT A and DOUT B are not used for the same reason.The address generator increments when the active pixel are presented, based on the dataenable signal from the source. The entire line that consists of 1920 pixel data of 32-bit arestored into to line buffers. The true dual-port block RAM in 7 series FPGAs extends thisflexibility to Read and Write where each individual port can be configured with differentdata bit widths. Thus block RAM Port A is configured as 32-bit addressable with depthof 2048. On the other side the word of Port B is set to be 512-bit wide and the depth is32 ∗ 2048/512 = 128. The read operation is controlled by the acknowledgement from thememory. The read address ADDR B increments after every indications of successful write.

4.1.3 Memory Burst Controller

Although not illustrated in Figure 4.5, all the video timing signals are registered in the systemclock domain and used to coordinate the controller FSM. The state diagram is given in Figure4.7. The FSM gets reset to State IDLE at the end of frame according to the vertical sync(registered in the system clock domain). As soon as the entire line gets written into the linebuffer, the FSM goes to State REQ. Here, instead of waiting for HS, a negative sync shot

30

4.1 Video Interface

IDLE REQ

ACK

DONE

CHK

VS SYNC

WRITE ACK

BURST

COUNT

DONE

DE SYNC

Figure 4.7: State Diagram of the Memory Burst FSM - This figure illustrates the operation of the FSMstates in memory burst controller.

of DE is used since HS arrives later than the end of active line data. State REQ sends thememory write request and then goes to State ACK to wait for the acknowledgement fromthe memory. Once it arrives, the FSM goes to State CHK. In this stage, the memory burstcontroller starts issuing writes and counts the number of burst performed. The memory burstcontroller operates memory burst commands for storing each of the 512-bit data from the PortB of the line buffer. Thus a fixed burst value of 4 is used for 32 bursts, since 4×512 = 2048.Thus if write transactions are grouped into 32-burst, the burst value defines the number ofsuch groups required in order to finish writing one video line. Here, video line size is set tobe 2048 in order to hold 1920 pixels of size 32 bits. At the end of 4 bursts, the FSM goes toState DONE and refresh the read address used on line buffer. It then goes back to the IDLE

to wait for the next line.

The output of the memory burst controller is not yet AXI4 compatible. They are only thebasic logic to monitor and complete a series of write in burst mode. These output signals,however, can be easily translated to coordinate with other non-AXI4 memory controller in-terface (such as the Memory Controller Block core for Spartanr-6 [42]). The translation ofthese will be given in the next section where the AXI4 read/write controller is introduced.

The address for storing each pixel data is generated by the video interface. Since wehave a burst size of 32, the byte address needs to be incremented by 2048 at the end of every

31

4. HDR FRAME BUFFER

video line, which is calculated with 2048 = 32 × (512/8). For storing data for a entire HDframe, at least 1080 lines will be stored. Thus, 2048 line addresses are allocated for eachframes. This used up 22 bits for the address. The next few bits defines the frame addressfor different exposures and is passed in from the top-level module, depending on the numberof exposures presented on the input stream. Thus frame data with different exposure will bestored into different memory location as shown earlier in Figure 4.2. For a common settingof three or four exposures, there are still a lot of memory space left (8Gbit in total). Thus,a double frame buffer mechanism is also used by alternating the most significant bit of theaddress based on Vertical Sync signals.

The double-frame buffering mechanism is used in the graphic card industry for drawinggraphic that has no flickering, and tearing. It aims to improve the Perceived Performance ofthe graphic applications. When a video frame gets updated, entire frame is erased before newframe is drawn. This creates an intermediate blank image seen by the user as flickering. In asingle buffer case, the blank time includes the time to clear and refill the buffer, in additionto that of filling the screen. Thus double-frame buffering reduces the blanking time to reducethe flickering. The other issue is known as tearing. It occurs when a frame update is visiblemomentarily as a horizontal divider between the current frame and the previous frame. Withuse of double buffering, screen tearing is less noticeable since more than two frames finishrendering during the same refresh interval. In this case, the screen has several narrower tearsinstead of a single wider one.

To summarize, the 32-bit address is divided into different segment and generated in theHDR frame buffer as follows,

• At the end of each burst, addr[13:0] is incremented by 13’d2048 for a burstnumber of 32. 14-bits are used here for holding the entire video line which requiresfour groups of bursts.

• At the end of each video line, addr[24:14] is incremented by 1’d1 for line addressincrementing. 11-bits are used for storing an entire frame consisting of 1080 videolines.

• Depending on the numberN of the input video, addr[26:25] is used differently foreach exposure. 2-bit of exposure address allows maximum of four exposures becauseof the memory capacity. If double frame is not implemented, maximum exposurenumber is extended to 8 due to DDR3 bandwidth.

32

4.2 AXI4 Read/Write Controller

• At the end of every N frames, addr[27] is toggled so that a double frame mecha-nism is achieved.

4.2 AXI4 Read/Write Controller

AXI4 read/write controller buffers the data and translates the memory read/write commandson AXI4-compatible slave interfaces. The need of AXI4 read/write controller is to make thebuffer design more flexible. Without use of AXI4 interconnect, this can be connect directlyto the memory controller to be used as a single-port video frame buffer.

Table 4.2 shows the signals for the three AXI4 write channels (address, data, and re-sponse) and their configuration in HDRchitecture. The descriptions of all the AXI4 signalsare specifically given for their use in HDRchitecture and are not intended to cover all detailsof the AXI4 protocol in the original AXI4 specification [11].

Write Address Channel Signals

Signal Width Description

AWID 4 Write Address ID - This identification tag is set to 4’d0 because itconnects to only one slave interface from the AXI4 Interconnect.

AWADDR 32 Write Address - This input indicates the address of the first transferin a write burst. It is given by the connecting Video Interface.

AWLEN 8 Burst Length - It gives exact number of transfers in a burst. Theburst length is set to 8’d32 for 32 bursts, each of 512 bits.

AWSIZE 3 Burst Size - Burst size indicates the maximum number of bytes ineach transfer. If the AXI bus is wider than the burst size, the AXIinterface must determine from the transfer address which byte lanesof the data bus to use for each transfer. It is set to 3’b110 formaximum size of 64 bytes (i.e., 512/8 = 64).

33

4. HDR FRAME BUFFER

AWBURST 2 Burst Type - This signal determines how the address for each trans-fer is calculated. The burst type is set to 0’b01, incrementing. Thisensures the address for each transfer in the burst is an increment ofthe address for the precious transfer. The increment value is deter-mined by the size of the transfer.

AWLOCK 1 Lock Type - It provides additional information about the atomiccharacteristics of the transfer. It is not used and set to 1’b0.

AWCACHE 4 Memory Type - Memory type is set to 4’b111 for Normal Non-Cacheable Bufferable read. Data can be obtained from a write trans-action that is still progressing to its final destination. Read data re-turned in this manner does not indicate that the write transaction isvisible at the final destination.

AWPROT 3 Protection Type - This signal is set to 3’b000 to indicate non-privileged secured data access.

AWQOS 4 Quality of Service (QoS) - The QoS identifier sent for each writetransaction is set to 4’b0 and not used.

AWVALID 1 AWVALID - This signal indicates that the channel is signalling validwrite address and control for the slave interface.

AWREADY 1 AWREADY - This signal indicates that the slave is ready to acceptand address and associated control signals.

Write Data Channel SignalsSignal Width Description

WDATA 512 Write Data - Write data width is same as the Port B of the doublebuffer. It is buffered by a FIFO in AXI4 read/write controller andgets read out when WREADY indicating the slave is ready to acceptwrite data.

WSTRB 64 Write Strobes - This signal indicates which byte lanes hold validdata. Since there should be one write strobe bit for each eight bitswrite data, this signal is set to {{64}{1’b1}}.

34

4.3 Resource Usage

WLAST 1 Write Last - This signal indicates the last transfer in a write burst.This is given by monitoring a burst counter that increments on everysuccessful transfer. The total number of burst (i.e., 32) is given bythe video interface.

WVALID 1 Write Valid - This signal indicates that valid write data and strobesare available on the slave interface. It is given by the video interfaceaccording to the FSM state.

WREADY 1 Write Valid - This input signal indicates that the connecting slave isready to accept the write data. It is controlled by the AXI4 intercon-nect. This signal is also used to enable read on the data FIFO.

Write Response Channel Signals

Signal Width Description

BID 4 Response ID Tag - This is the ID tag of the write response. It isignored in the HDR frame buffer implementation.

BRESP 2 Write Response - This indicates the status of the write transaction.It is also ignored in the HDR frame buffer implementation.

BVALID 1 Write Response Valid - This indicates that the channel is signalinga valid write response. It is given by the connecting slave.

BREADY 1 Response Valid - This signal indicates the AXI4 read/write con-troller can accept a write response. It is driven by the read enableof the data FIFO.

Table 4.2: Write Address Channel Signals - This table describes the signals in the write channel for theAXI4 read/write controller.

4.3 Resource Usage

The resource usage of the HDR frame buffer is summarized in the Table 4.3. These resultare from synthesized prototype using Kintexr-7 FPGA. The input video has a Wyckoff setof size three. Thus the AXI4 Interconnect IP is configured using three ports. There will be

35

4. HDR FRAME BUFFER

ResourcesCircuit Slice LUT Slice Reg Block RAM DSP

Video Interface 721 1056 56 0AXI4 R/W Controller 429 1465 16 0AXI4 Interconnect 21496 32142 18 0DDR3 Controller 12068 10039 0 0

Table 4.3: Resource Usage of the HDR Frame Buffer.

only one slave interface used for the DDR3, thus the resource utilization of the interconnectis depending on the number of master interfaces. The detailed resource growth depending onthe number of master ports can be found on [45]. The DDR3 controller IP generated usingvendor tool remains using the same resources. This is also summarized in the table.

36

Chapter 5

Composition with Weighted-Sum

In this chapter, the compositing architectures of the traditional weight-sum approach ispresented. This architecture contains two stages, as shown earlier in Figure 3.4. The in-verse camera response function values and the certainties from different algorithm are pre-calculated and then stored into lookup tables. An arithmetic circuit then takes values fromthe lookup table as input based on the current pixel value to perform the weighted-sum calcu-lation. Weighted-sum approach is perhaps the most common method for compositing HDRquantigraphically. The FPGA implementation of this stage, targeting FPGAs with small size,was designed in a way to preserve the precision and maximize the speed of the arithmeticoperations. Except for the camera responses, all the other components in the circuit are thesame for each of the three RGB channels.

The work presented in this chapter has been demonstrated in IEEE CCECE 2012 [30],ACM SIGGRAPH 2012 [24], and ACM Multimedia 2012 [17].

5.1 Straight-Forward Implementation

According to Mann, [22], a weighted-sum approach for composition is given as follows,

q(x) =

∑i

ciqi(x)∑i

ci=

∑i

c(f−1(fi(x)))kif−1(fi(x))∑

i

c(f−1(fi(x)))(5.1)

In this original weighted-sum approach, c(qi(x)) are the spatially varying certainty func-tions, which act on the domain qi(x) of the camera response function. The domain certaintyis a function of photoquantity qi(x) and can be derived by taking the derivative of the shiftedcamera response function (i.e., up or down by an amount ki) with respect to qi(x).

A straight-forward way to implementation the weighted-sum approach on FPGA is illus-trated in a block diagram shown in Figure 5.1. For an input video stream with N differentexposures, this architecture maps each pixel values fi(x) to the corresponding photoquanti-

37

5. COMPOSITION WITH WEIGHTED-SUM

+

/

f1(x)

q(x)

c(x)

× k1

×

c1

c(x)

× k2

×

c2

c(x)

× ki

×

ci

qi(x)

+

f -1(x)

f -1(x)

f -1(x)

...

... ...

f2(x)

fi(x)

q2(x)

q1(x)

Figure 5.1: Weighted-Sum Composition Circuit Before Improvement - The block diagram of theweighted-sum compositing circuit before improvement.

ties qi(x) and combines them arithmetically to give output photoquantity q(x). The N videoinput streams, running parallel with different exposures, are split by the HDR frame buffer.As shown in the Figure 5.1, each of them need to be mapped into light-space using the in-verse camera response function f−1(x). Since each pixel fi(x) value is discrete and we havefi(x) ∈ [0, 255], a lookup table can be used to hold all the values of the discrete inversecamera response function for mapping each pixel. If an 16-bit photoquantity is produced,the size of the lookup table is 16 bit× 256 = 4096 bit.

For all photoquantities estimated from each of the exposures, qi(x) = kif−1(fi(x)), they

need to be adjust to the same exposure level by multiplying the exposure ratio ki. Sinceeach exposure ratio (expressed using the f-stop) is a power of 2, the multiplication can beimplemented using shifters. For the certainties c(f−1(fi(x))), each 16-bit photoquantity isused to address the certainty lookup table. Instead of using the full address range, the depthand the width of certainty lookup table can be reduced as long as the perceived performanceof the output video can be achieved. With all values of ci and qi(x), the final result ofestimated photoquantity q(x) can be calculated respectively using multipliers, adders and adivider, according to Equation 5.1.

The problem of the straight-forward implementation comes from the certainty lookuptable. First of all, its depth depends on the word-lengths of the values returned from the

38

5.2 Adjusting Equation for Improvement

previous stage. These per-exposure photoquantities are usually more than 16 bits. Secondly,since the photoquantity is used as the address, it takes in total two lookup stages before thevalue is retrieved. In order to improve the area and speed of the implementation, multipleconsecutive lookup and arithmetic operations should be combined by slightly altering thecompositing equation.

5.2 Adjusting Equation for Improvement

In order to reduce the resource usage of the implementation and also to improve the speed,several architectural decisions have been made. The photoquantity certainty represents thesensitivity of the sensor across the entire camera response function. This is used for de-termine how well the current pixel is exposed to light. Since this can be directly reflectedby the pixel values (i.e., the range of the camera response function), we can use the ideaof range certainty (i.e., pixel weight) and save one lookup stage. Empirically, the cameraresponse will be most sensitive towards the middle of its range, i.e., 128 for 8-bit pixels.The sensitivity, or well-exposedness, decreases when the pixel value changes towards bothend (i.e., 0 or 255). Thus a Gaussian function can be adapted to weight a normally ex-posed images. More assumptions can be made to adjust the weighting function but thekey point here is to convert the photoquantity-certainty function c(f−1(fi(x))) into pixel-weight function w(fi(x)). Since they are now all addressed by the 8-bit pixel values, thetwo lookup tables for camera inverse response f−1(x), and pixel weight w(x) can be com-bined with the multiplications with the exposure adjustment kj into one single lookup tableF (fi(x)) = w(fi(x))kif

−1(fi(x)). The changes made for implementation speed and area incompositing equation are summarized as follows,

q(x) =

∑i

c(f−1(fi(x)))kif−1(fi(x))∑

i

c(f−1(fi(x)))≈

∑i

w(fi(x))kif−1(fi(x))∑

i

w(fi(x))=

∑i

F (fi(x))∑i

W (fi(x))(5.2)

This equation can be implemented efficiently on FPGA, where:

• The pixel values fi(x) are from the differently exposed frames, respectively. They areused as 8-bit addresses to lookup the corresponding results from the lookup tables.• The weighted response F (fi(x)) = w(fi(x))kif

−1(fi(x)), is a product of the inversecamera responses with exposure compensations and the corresponding weights. Thesepre-computed results are stored in block RAM lookup tables for each RGB channels.

39


F1(x)

F2(x)

W1(x)

W2(x)

F3(x) W3(x)+ +

/f1(x)

...

Fi(x) Wi(x)

...

...

...

q(x)f2(x) f3(x) fi(x)

Figure 5.2: Improved Weighted-Sum Compositing Circuit - The block diagram of the improvedweighted-sum compositing circuit.

The circuit latency and resource for multiplication can thus be omitted.• The pixel weight W (fi(x)) = wi(fi(x)), is the weighing used for pixel values in each

exposure. A divider is used after the two results have been generated by summingF (fi(x)) and W (fi(x)).

As shown in Figure 5.2, pixel values in each images of different exposure are sent as inputs.Each pixel value is used as an address to look up the corresponding weight and responsefrom a lookup table. The photoquantity q(x) is estimated through fixed-point computation.

5.3 Weighting Schemes

In HDRchitecture, different weighting schemes can be used when the reference exposure ischosen differently and when the number of exposures in the input is set to a different value.In an input stream where the constant exposure separation between frames are small, threeGaussian functions with changes in their centers can be used as the weighting functions.For larger exposure separation (4 or 5 stops apart) used in a daily scene, most part of thedarker/brighter frame contains information that is not presented in either of the other twoimages. Thus we know the darker/brighter region of the darker/brighter frame will mostlycontains no useful information. Assuming the medium exposure is properly exposed andused as the reference exposure, two sigmoid functions will be used for the frames that iseither over- or under- exposed, as shown in Figure 5.3).

40

5.4 Determine the Word-Lengths

0 64 128 192 2560

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pixel Value − Darker Exposure

Cer

tain

ty

0 64 128 192 2560

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pixel Value − Medium Exposure

Cer

tain

ty

0 64 128 192 2560

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pixel Value − Brighter Exposure

Cer

tain

ty

Figure 5.3: Weighting Scheme Example - This example shows certainties used for a three-exposureWyckoff set with a large exposure separation.

The exposure weighting function used for frames with medium exposure is modeled bya Gaussian distribution centered at 128.

wm(x) = exp

[(x/255− µ)2

2σ2

](5.3)

As mentioned before, a reasonable weighting estimate for the brighter part of an under-exposed image should be higher than that of the mid-tone. Thus, while a Gaussian distribu-tion is used for medium exposure, where the highest weight is assigned at pixel value 128.Two sigmoid functions are derived from logistic functions for the over-exposed frames,

wb(x) =1

1 + exp[(x− 127)/ν

] . (5.4)

and the under-exposed frame,

wd(x) =1

1 + exp[(127− x)/ν

] . (5.5)

Furthermore, to avoid divide-by-zero error, a small offset is set (i.e., the LSB of the resultis set to 1’b1) for all pre-computed values in the lookup tables. In the weighting functionsused above, ν, σ, and µ are given empirically as 15, 0.125, and 0.5.

5.4 Determine the Word-Lengths

In order to determine the word-length of each signal in the compositing circuit, simulationsare performed to keep track of all the peak values. These results are used as references

41


argmaxSignals Peak fb fm fd

Fb(x) 0.0169 105 – –Fm(x) 0.1188 – 140 –Fd(x) 3.9992 – – 255Wb(x) 0.9998 0 – –Wm(x) 0.9999 – 127 –Wd(x) 0.9998 – – 255∑

Fi(x) 4.1350 105 140 255∑Wi(x)

1 1.9999 127 127 127q 3.9983 255 255 255

Table 5.1: Simulation Results for Determining the Word-Lengths.

to scale each signal into its fixed-point representation. This provides a good control overthe variable word-lengths used in the datapath of a fixed-point computation circuit. Notealthough

∑Wi(x) reaches 2.9995 when fb = 0 fm = 127, and fd = 255, it cannot happen in

a real input video stream when a constant exposure separation is specified for each exposure.

5.5 Result of Weighted-Sum Implementation

The system is prototyped using the AtlysTM Development Board. It contains a Xilinx Spartanr-6 LX45 FPGA. The results discussed here for implementation of the weighted-sum are gen-erated using a video stream with three exposures, where each 8-bit pixel is mapped to a 32-bitphotoquantity. The width of the lookup tables used for certainty/weight is set to 16-bit andthe depth is set to 8-bit. Thus, for comparison between the two versions of implementation,the certainty/weight lookup tables will be addressed by either the 8-bit pixel or the mostsignificant 8-bit of the photoquantity.

5.5.1 Resource Usage

The resource usage of the implementation comes from block RAM and arithmetic. The blockdiagram Figure 5.1 and Figure 5.2 shows only the compositing circuit for one color channel.In the synthesized circuit, different lookup tables with the same size are used for each color.Thus the resource with be three times larger than what was shown in both diagrams. Table 5.2summaries the total resource needed for implementing 3-color-channel compositing circuitfor input with three exposures.

42

5.6 Conclusions

ResourcesCircuit Slice LUT Slice Reg Block RAM DSP

Before Improvement 2649 2752 18 9After Improvement 2585 2697 18 0Total Resource Available 54576 27288 232 58

Table 5.2: Resource Usage of the Weighted-Sum Compositing Circuits.

A slightly higher usage in the Slice LUT and Reg is shown on the implementation beforethe improvement because longer shift registers are needed for delaying the datapath signals.The extra latency is coming from the use of the pipelined multiplier, which takes 8 cyclesto stream out the result. The multipliers are implemented using the DSP slice (DSP48A1)on the FPGA. Each 16 bit× 16 bit multiplier takes one DSP slice [13]. Both circuit use thesame amount of block RAM: both implementation use 9 lookup tables of size 32 × 256 forinverse camera response functions and 9 lookup tables of size 16 × 256 for the weightingfunctions. The block RAM (RAMB8BWER) used on Spartanr-6 has a size 9Kb [12]. Thus,each of them use up 18 block RAMs on the FPGA.

5.6 Conclusions

The weight-sum approach circuit runs at 143MHz before improvement and 166MHz after theequation adjustment. For targeting the small-size battery-powered FPGAs, this implementa-tion is miniaturized to be used on ordinary wearable eyeglasses, as shown in Figure 5.4. Agoal of the weighted-sum implementation is to show the development of HDR eyeglasses asa general-purpose seeing aid for everyday life.

Figure 5.4: Weighted-Sum Composition Used on EyeTap - Example showing HDR used on a weldinghelmet or EyeTap Device [28].

43

Chapter 6

Composition with Quadtree CCRF

In this chapter, the compositing architecture of the quadtree CCRF approach is presented.This architecture contains two stages, as shown earlier in Figure 3.4. In this method, thecomposition results for all pixel combinations are pre-calculated and stored into a two-dimensional lookup table. The table is then compressed using a quadtree algorithm in soft-ware. The circuit uses a multiplexer network to find addresses of the corresponding pixelpairs. Then the address is used to retrieve values in the lookup tables for further interpola-tion.

The work discussed in this chapter has been presented in IEEE ISTAS 2013 [1], andIEEE CCECE 2014 [38].

6.1 Quadtree CCRF Composition

In [2], a novel method for HDR compositing was presented. This method makes extensiveuse of a two-dimensional lookup table called Comparametric Camera Response Function

(CCRF). Each CCRF is computed as a two-dimensional inverse camera response functionsuch that for two input pixel values given, it returns a refined estimate of the scene’s photo-quantity, even when the input number of exposures are high (the conventional weighted-sumapproach behaves poorly in this case because they estimate photoquantity independently).

Every CCRF has the domain of a comparagram [28] and the range of a camera responsefunction. The CCRF is used for HDR compositing by giving a pairwise estimate of photo-quantity q(x) from two pixel values fi(x) and fi+1(x) from consecutively exposed frameswith difference ∆EV . On systems without significant memory constraints (i.e., desktopswith GPUs), the CCRF can be stored in a two-dimensional lookup table and expanded to anynumber of exposures using the pairwise topology, introduced earlier in Chapter 2. As illus-trated in Figure 6.1, by concatenating the two pixel values from consecutive exposures, anaddress can be formed to lookup the precomputed value in a one-dimensional lookup table.The latency needed for the lookup depends on the selection of the storage. For higher op-

44

6.2 Resource Limitation

erating frequency, multiple CCRF lookup tables are instantiated in parallel since a constantexposure separation ∆EV can be maintain through the process.

{f1(x), f2(x)}

ADDR

CCRF LUT

q(x)

CCRF

LUT

CCRF

LUT

CCRF

LUT

ΔEV ΔEV

ΔEV

f1(x) f2(x) f3(x)

q(x)

Figure 6.1: The Pairwise Topology of CCRF-Based Composition - Left: CCRF-based compositingusing a single lookup table. Right: The pairwise topology of CCRF-based compositing for multipleexposures.

6.2 Resource Limitation

CCRF lookup tables require a huge amount of memory resource. For a W × W two-dimensional lookup tables with W -bit (which is usually more than 16) outputs, the sizeS we need to store a single CCRF lookup table is,

S = W · 2W+W . (6.1)

The number of pairwise lookup table instantiations P is depending on the size of the inputWyckoff Set N , as follows,

P = 3(N)(N − 1)

2. (6.2)

Thus when implemented on FPGA, novel compression method is required for storing CCRFefficiently. Instead of using total size of S · P for storing all the results that arises fromcombinations of all values of f1 and f2, a large number of the table entries are groupedtogether when the rate of change in these values is low. The detailed implementation is

45

6. COMPOSITION WITH QUADTREE CCRF

given in the later sections. The resulting circuit is an interpolating memory for general two-dimensional lookup with function compressed using a quadtree algorithm [32].

6.3 Quadtree Compression

To implement CCRF lookup on a mid-size FPGA where composition needs to be fast and on-chip storage (block RAM) is limited, we choose to compress the CCRF based on its quadtreerepresentation following an indirect table lookup architecture that is widely used in the areaof computer arithmetic. The proposed quadtree algorithm here compresses and implementsa two-dimensional look-Up table on an FPGA architecture, by bounding the error and spaceof a quadtree representation of the original lookup table according to the expected usage, sothat the lookup table is compressed to fit within the total amount of the block RAM resourceavailable in a mid-sized FPGA.

For any given sensor, the comparagram is directly related to the camera response func-tion. The most frequently accessed CCRF values lie along the comparagram. Therefore,higher precision on interpolation may not be necessary for CCRF lookup points that are dis-tant from the comparagram [38]. This suggests that the error constraint for the pair (fi(x),fi+1(x)) should vary depending on its likelihood of occurrence. The error bound is thenweighted by the expected usage, so that values used more often have a smaller error bound.As the compression result, the compressed quadtree representation is a sparse tree and haspre-computed values stored in its leaves for interpolation [1].

For the N × N square table with N quantization steps on its columns and rows usedto represent each combination of fi(x) and fi+1(x), we can generate a quadtree, where allnodes are squares and each parent node in such a tree contains four children nodes, to fullyrepresent the CCRF lookup table. If the quadtree is a complete tree then it would have withlog4N

2 or log2N levels. One method of generating such a tree is to recursively divide aunit square into four quadrants (four smaller but equal size squares). We can visualize thecenter of divided unit square as the parent node of the four quadrants. The center of eachquadrant is considered a child node. Such process is performed recursively in each quadrantuntil the root unit square is divided into N ×N equally sized squares. The bottom nodes ofthe quadtree are the leaves of the tree, each of which stores the CCRF lookup values of thecorresponding pixel pair (fi(x), fi+1(x)), as shown in Fig. 6.2. In this figure, areas that aredarkly dotted in this plot correspond to joint values that are more likely to occur. Regions ofrapid change or high use are more finely subdivided for greater accuracy. Inside each square

46

6.4 Circuit Architecture for 2D Function Evaluation

Figure 6.2: Quadtree Representation of CCRF - This example shows a compressed CCRF withquadtree algorithm.

the CCRF value is approximated using bilinear interpolation based on the corner values.Note the self-similar nature of the representation, with a granularity that matches regions ofinterest, defined by the error in reconstruction and expected frequency of use.

A good compression method ensures that high precision can be implemented using asmall amount of resource. Larger leaf nodes are used in areas that are smoother and less-used. As shown in Fig. 6.2, the CCRF lookup table is less steep in these areas. A CCRFlookup table usually has 1024 columns for fi(x) and 1024 rows for fi+1(x) [2]. Therefore,there is no point to construct a quadtree that has a depth greater than log2 1024 = 10. Thedepth of the tree can be constrained to have fewer levels than 10 as long as the error isacceptable. This affects the resulting number of nodes in the CCRF quadtree as well as thenumber of iterations required for searching a leaf node. In terms of hardware benefits, itresults improved timing and reduced resources usage.


The compressed CCRF indicates that the design of circuit to follow an indirect table lookuparchitecture. Depending on the amount of table entries used in design a specific logic circuit,we can usually define a fuzzy boundary between the two uses of tables (as supporting orprimary role). The two extreme endpoints of table uses are the pure logic (with no table

47


entries) and direct lookup (with all entries pre-computed). For CCRF evaluation, the directlookup can be very expensive. The resource usage goes exponential growth of the tablesize often becomes intolerable [34]. To estimate the resource, let’s consider the general caseof an m-variable. Given an m-variable function f(xm−1, xm−2, ..., x1, x0), where the word-lengths of inputs are wi. The word-lengths of the output is usually defined by the precisionrequirements, noted as v. Then, the equation,

u =m−1∑i=0

wi

represents the size of all the combinations of input values. Thus by concatenating the inputs,an u-bit address can be generated for storing the corresponding v-bit entry in the lookuptable, which has total size of (2u − 1)× v.

PRE-

PROCESS

POST-

PROCESS

TABLE

LOOKUP

OPERAND

RESULT

u

v

Figure 6.3: Indirect Table Lookup - General architecture of indirect table lookup with pre- and post-processing.

To reduce the table size, logic circuits are usually used together with the table to pre-process and post-process the operands, leading to the indirect table lookup, as shown in Fig.6.3. In modern FPGAs, on-chip block memory makes table lookups practical in many appli-cations. There are two advantages of using table lookup methods. It reduces the hardwaredevelopment cost by pre-computing all the result using software programs and improves thesystem reliability since the results are usually stored in read-only memories.

As shown in Fig. 6.4, the compositing circuit consists of circuits that perform addressingand interpolation. A pipelined addressing circuit and an interpolation circuit are used to-gether to process the operands fi(x) and fi+1(x). To further reduce resources while produc-ing a more accurate result, the region being interpolated must be divided into non-uniformpieces[15, 16]. Evaluation of the CCRF function will then start with table lookup to re-

48


trieve the coefficients based on the given operand fi(x) and fi+1(x). The addressing circuitis used here to generate the addresses. The interpolation circuit then takes the table entriesand approximates the function. Thus, based on the architecture for indirect table lookup, thecompositing circuit (as shown in Fig. 6.4) can be constructed with address generation andinterpolation, as pre- and post-process, respectively.

ARITH

LUT A

MUX

TREE

ADDR

fi(x)

LUT B

LUT C

LUT D

LUT E

LUT X

LUT Y

ADDRESSING CIRCUIT INTERPOLATION CIRCUIT

q(x)

fi+1(x)

Figure 6.4: Quadtree CCRF Compositing Circuit - The top-level system diagram of the CCRF com-positing circuit.

The above proposed architecture is a table-based two-dimensional function evaluator.By designing the circuit in a way that can be implemented by distributing resources usageto all types that are available on an FPGA architecture, these can be grouped into four part,according to the resource usage. Pre-computed addresses mainly use LUTRAM resources onthe FPGA. They are retrieved by multiplexer network that has similar quadtree structure asthe compressed CCRF, according to the input values of fi(x) and fi+1(x). The multiplexernetwork is implemented as a combinational circuit using logic slices on the FPGA. Whenpipelined for extra throughput, it also consumes the registers in the logic slices. Block RAMare used for lookup tables that stores the pre-computed corner values for interpolation. Itssize grows with the number of leaves in the quadtree used to represent the CCRF lookup

49


table. The arithmetic circuit is implemented mainly using the DSP slices available on theFPGA. It performs the bilinear interpolation to estimate the location of a pixel pair relativeto the CCRF node it is in.

1 //-----------------------------------------2 //4-to-1 N-Bit MUX3 //Using CASE Statement Verilog Example4 //-----------------------------------------5 module mux4to1_style1 #(6 parameter AW = 87 )(8 input wire [1:0] i_sel,9 input wire [AW-1:0] i_d00,

10 input wire [AW-1:0] i_d01,11 input wire [AW-1:0] i_d10,12 input wire [AW-1:0] i_d11,13 output reg [AW-1:0] o_dat14 );1516 always@(i_sel, i_d00, i_d01, i_d10, i_d11)17 begin case (i_sel)18 2’b00 : o_dat <= i_d00;19 2’b01 : o_dat <= i_d01;20 2’b10 : o_dat <= i_d10;21 2’b11 : o_dat <= i_d11;22 default : o_dat <= i_d00;23 endcase24 end2526 endmodule

//---------------------------------------//4-to-1 N-Bit MUX//Using IF Statement Verilog Example//---------------------------------------module mux4to1_style2 #(

parameter AW = 8)(

input wire [1:0] i_sel,input wire [AW-1:0] i_d00,input wire [AW-1:0] i_d01,input wire [AW-1:0] i_d10,input wire [AW-1:0] i_d11,output reg [AW-1:0] o_dat

);

always@(i_sel, i_d00, i_d01, i_d10, i_d11)begin

if (i_sel == 2’b00) o <= i_d00;else if (i_sel == 2’b01) o <= i_d01;else if (i_sel == 2’b10) o <= i_d10;else o <= i_d11;

end

endmodule

6.5 Multiplexer Network

The addressing circuit consists of a multiplexer network and pre-calculated addresses. Areasonable coding style has to be chosen in order to describe the multiplexer. Based on theXilinx synthesis guide [46], two algorithms that produces addressing circuit with differentstyle has been used for comparison:

Using CASE StatementsA multiplex instance is written manually with case statement first. The softwarecoder algorithm then instantiates these multiplexers and wires them according to thestructure of the tree. The algorithm is shown in Algorithm 1 in Appendix A.

Using IF-ELSE StatementsWith the concern of creating priority chain, this coding style is also experimented.Instead of instantiating multiplexers to construct a network, a long nested if-else

50

6.5 Multiplexer Network

structure is generated while the quadtree is traversed. This algorithm is shown inAlgorithm 2 in Appendix B.

The relation between the original quadtree and the multiplexer network makes it verystraight-forward to generate the Verilog HDL using the same quadtree date structure in soft-ware. Efficiently using 4-to-1 multiplexers in the 6-LUT FPGA architecture can significantlyreduce the resource usage and code generation algorithm [47]. For each leaf in the quadtreerepresentation of the CCRF lookup table, we store pre-calculated values in the node forbilinear interpolation. As the algorithm starts traversing through the non-leaf nodes, multi-plexer tree is generated according to conditions associated with each bit of the input pixelvalue. The boundaries of the neighbouring tree node are powers of 2. Thus, instead of usingcomparators as multiplexer conditions, each bit of the input tonal values (i.e., pi,j-1[Bit]

pi,j[Bit]) is used as mux selection signal when we going deeper along the tree.

0x00

0x01

0x02

0x03

0x04

0x05

0x06

0x08

0x09

0x0A

0x0B

0x0C

0x0D

0x0F

0x10

0x11

0x12

0x0E

0x07

ADDR

0x00

0x01

0x02

0x03

0x04 0x05

0x06

0x08

0x09

0x0A0x0B

0x0C

0x0D 0x0F

0x10

0x11

0x12

0x0E

0x07

0x00

0x010x02

0x04

0x07

0x03

0x0F0x10

0x11 0x12

0x0B0x0C

0x0E0x0D

0x080x09

0x0A

0x06

0x05

fi,j[1]fi,j+1[1] fi,j[2]fi,j+1[2]

fi,j[3]fi,j+1[3]

Figure 6.5: A Simple CCRF with Quadtree Compression and Multiplexers - An simple example withthe address usage showing in CCRF lookup table, its quadtree representation and the mux tree implemen-tation.

A simple example showing the design of the multiplexer network and its relation betweenthe CCRF and quadtree representation is given in Figure 6.5. Since the direction of the dataflow in the final circuit is opposite to that of the HDL coder algorithm, the MSB of fi(x)

and fi+1(x) selects the four input of the right most multiplexer in Figure 6.5. In this simpleexample, even a complete CCRF lookup table will have only 8× 8 entries. Only 4-bit tonal

51


values are need to generate the address (There is an extra bit added and set to 1’b1 for eachinput to offset the tonal value into the middle of the quadtree node).

The actual circuit generated contains 644 multiplexers with more than 8000 lines of Ver-ilog HDL. After both generated circuits are synthesized, they are found to use almost sameresources and have almost same critical path delay (if priority chains are created, delay willbe much longer). This gives an evidence that the Xilinx synthesize tool convert the nestedIF-ELSE statements into proper multiplexing network. The two coder algorithms are givenas pseudo-code in Appendix A and B, with the generated Verilog HDL for the simple exam-ple shown in Figure 6.5.

ARITHMETIC

X Epi,j-1A D B Y pi,j C

+ +

>> >>

× ×

+

×

+

+

×

qi

Figure 6.6: Interpolation Circuit - Block diagram showing the arithmetic and pipeline stages for thebi-linear interpolation circuit.

6.6 Interpolation Circuit

As an extension of linear interpolation, bilinear interpolation is often used in image re-sampling algorithms. The mechanism is similar except it is performed on a two-dimensional

52

6.7 Pipelining

grid. Thus, for function with two variables, bilinear interpolation is a feasible choice. Sup-pose the value of F are known at the four corner points Q11 = (x1, y1), Q12 = (x1, y2),Q21 = (x2, y1), and Q22 = (x2, y2). To find the value of the binary F (x, y) at pointP = (x, y), in the x-direction, we have,

F (R1) ≈x2 − xx2 − x1

F (Q11) +x− x1x2 − x1

F (Q21),

where R1 = (x, y1), and,

F (R2) ≈x2 − xx2 − x1

F (Q12) +x− x1x2 − x1

F (Q22),

where R2 = (x, y2). Then, interpolating in the y-direction combines the two results andproduces the estimation of F (P ) as,

F (P ) ≈ y2 − yy2 − y1

F (R1) +y − y1y2 − y1

F (R2).

The fixed-pointing arithmetic circuit for bilinear interpolation is given as a block diagramshown in Figure 6.6. The dashed lines represent the pipeline stages that can be used toincrease the circuit operating frequency.

6.7 Pipelining

The compression algorithm constrains the depth of the tree. In a quadtree with depth of 8,for example, the critical path delay of the generated addressing circuit will be longer than thepixel clock period. Thus pipelining of the addressing circuit is required to increase the oper-ating frequency of the addressing circuit to support HDR videos with higher resolution. Thisis achieved by breaking the long multiplexer chain into multiple shorter stages for inserting

Average Usage per InstanceModule # Instances Slice LUT BRAM DSP

Addressing Circuit 3N(N-1)/2 384 0 0Interpolation Circuit 3N(N-1)/2 81 63 18Total Resource Available 54685 890 840

Table 6.1: Resource Usage of the CCRF Compositing Circuit.

53


pipeline registers.

The addressing circuit contains multiplexers at many levels joined to form a tree. Toachieve the highest operating frequency all levels of tree can be pipelined. This results ina waste of resource since the number of muxes increases with the depth of the tree (i.e.,pipeline at deep levels were more expensive without much significant increase in the maxi-mum operating frequency). Also, with each level of pipeline registers added, select signalsare also required to be pipelined to synchronize with the data at each level. So it was decidedto pipeline only one or two level in the tree to have a moderate resource utilization as longas the frequency required to process the video is achieved.

As shown in the dashed line in the rightmost circuit in Figure 6.5, the pipeline registersare inserted at the location where the nodes are not leaves. The corresponding fi(x)[Bit]fi+1(x)[Bit]

of each stage are delayed to match the pipeline. The root of the remaining sub-tree will bestored. The same algorithm is performed starting at each of these new roots for HDL gener-ation. A fully pipelined addressing circuit could be generated to reach a maximum operat-ing frequency of 878MHz, with the resource usage doubled. If several multiplexers can betraversed within one pixel clock cycle, no pipeline registers are inserted between them. Dif-ferent levels of tree were pipelined through trial and error and the resulting pipelined circuitwith one pipeline stage runs at around 258MHz, which is sufficient for HD video application(148.5MHz for 1080p at 60fps [40]).

6.8 Result of Quadtree CCRF Implementation

The system was prototyped and evaluated using the ACDCTM1.0 Base Board to meet thehigh-bandwidth buffering requirements of real-time HDR video applications via HDMI in-terfaces. The board contains a Xilinx Kintexr-7 325T, which has 50, 950 configurable logicblocks and 16, 020Kb of block RAM storage [48].

# Leaves 880 1267 2101 3898Maximum Depth 7 8 9 10

Expected error (%) 0.231 0.118 0.100 0.092Mean error (%) 0.111 0.102 0.101 0.098# Slices LUTs 339 489 738 1206

Table 6.2: CCRF Composition Error vs. Number of Leaves.

54

6.8 Result of Quadtree CCRF Implementation

6.8.1 Resource

A breakdown of the resource usage for the compositing circuit is given in Table 6.1. Data ofthree main types of resources are shown in the table, i.e., Slice LUT, BRAM, and DSP48E1.The interpolation circuit, on the other hand, mainly uses the other two types of resources forlooking up corner values stored in block RAM and arithmetic performed using DSP Slices.The results are synthesized from a three-exposure system with tree depth of 8. The resultare averaged for estimating the single instance of the different modules so that the result canbe generalized for N exposures. The block RAM resource (RAMB18E1) used in Xilinx7-series FPGAs are of size 18Kb. For the given depth of the tree, the width of lookup tableaddress is 11 (i.e., determined by the maximum address 1932).

0.231

0.118

0.1 0.092

339

489

738

1206

0

200

400

600

800

1000

1200

1400

0

0.05

0.1

0.15

0.2

0.25

6 7 8 9 10 11

# S

lice

Usa

ge

Ex

pec

ted

Err

or

Maximum Allowed Depth

Expected Error

# Slice Usage

Figure 6.7: Area and Error vs Depth - This diagram shows the different area and error according to thedifferently chosen depth of tree.

6.8.2 Error and Area

The error bound of compression is weighted by the expected usage, so that values used moreoften have a smaller error bound. The error of the interpolation against the original lookupvalue at the input pair (fi(x), fi+1(x)) should be within the threshold set in the program.This error is also affected by the interpolation method. The absolute difference in the originalCCRF values and reconstructed values in compressed CCRF is calculated and then compared

55


against the normalized mass of the comparagram at the point being estimated to produce theresulting table (Table 6.2). The reference CCRF without compression contains 1024× 1024

floating-point entries.

The errors and areas of are also plotted against the different depths that are used to boundthe implementation. From this plot, a depth of 8 is chosen because this is where the errorcurve starts dropping slower while the total area remains reasonable.

Figure 6.8: High-Definition HDR Video based on Quadtree Composition - This figure shows a demon-stration of HD-HDR video implementation on a mid-sized FPGA. A hand covering in front of the 200Wlight bulb is needed because otherwise the camera used to take this photo cannot capture the rest of thescene.

6.9 Conclusions

The quadtree HDR compositing provides us with a method that generates compositing cir-cuit from a Wyckoff set of a random size N . Figure 6.8 shows a demonstration of theHigh-Definition HDR (HD-HDR) video output on a 1080p digital TV. The program writtengenerates a BRAM-based algorithm within a mid-sized FPGA, by fully utilizing differentkinds of resources, thus providing a mechanism for two-variable function evaluation. Com-pared to its GPU/CPU counterpart [38], it is more power-efficient. The implementation,which supports 1080p video through HDMI interfaces at 60 fps, is able to produce HDR

56

6.9 Conclusions

video stream in real-time that has a dynamic range of 20 stops, exceeding the capability ofhuman eyes.

57

Chapter 7

Spatial Tonal Mapping

In order to properly display HDR frames on commercial 8-bit displays, tone mapping isneeded to map the colors of the photoquantities to 8-bit colors in order to approximate theappearance of the composited image. The top-level block diagram of the tonal mapping cir-cuit is given in Figure 7.1. After each photoquantity has been calculated, the three channelswere compressed with inverse-power function and converted into YCbCr color space. Theconverted image was processed by three other blocks twice for edge enhancement. Beforefinal output, the video data was converted back to RGB color space.

To implement two-dimensional convolution for HDR video processing, storing completeHDR frames in memory is not a viable option due to the memory limitation, therefore, awindow of pixels must be available to be processed at each cycle. Since only part of thevideo frame is needed at any given time, the solution is to store m lines of the frame, wherem is the size of the kernel for convolution. For a reasonable amount of resource usage,m = 5

is used in HDRchitecture. The size of a convolution kernel can be effectively increased bymultiple stages of repeated processes.

2nd

STAGE

1st STAGE

SQRTqR

SQRT

SQRT

qG

qB

YCrCb

CONVERT

BUFFER CONV

EDGE ENHANCE

RGB

CONVERT

QR

QG

QB

Figure 7.1: Spatial Tonal Mapping Circuit - The block diagram of the tone-mapping circuit based onconvolution.

58

7.1 Color Space Conversion

7.1 Color Space Conversion

Instead of instantiating the same tonal mapping circuit module three times for each of thechannels in RGB color space, the compressed RGB values are converted into YCbCr colorspace first. In HDRchitecture, only Y channel will be used for tonal mapping, since it isessentially a grey scale copy of the image. The equations implemented to convert 8-bit tonalvalues from RGB into YCbCr are given as follows,

Y = 16 +65.738

256R +

129.057

256G+

25.046

256B (7.1)

Cb = 128− 37.954

256R− 74.494

256G+

112.439

256B (7.2)

Cr = 128 +112.439

256R− 91.154

256G+

18.285

256B. (7.3)

The inverse transform is,

R =298.082

256Y +

408.583

256Cr − 222.921 (7.4)

G =298.082

256Y − 100.291

256Cb− 208.120

256Cr + 135.576 (7.5)

B =298.082

256Y − 516.412

256Cb− 276.836. (7.6)

7.2 Window Provisioning Buffer

The window provisioning module buffer contains five line buffers and five shift registers forproviding a 5× 5 pixel window, acting as a storage system with parallel access for multipleprocessing stages. Since the HDR composition block outputs 16-bit color values per channel,each line buffer is implemented using 48-wide and 1024-deep true dual-port block RAM tostore a full line of pixels.

Each shift register is associated to a line buffer, and provides five pixels of the line, whilefive shift registers provide part of the five lines, constructing a pixel window. During a validhorizontal sync period, only one line buffer is written at a time, while all of them are readwith the same read address to provide a vertical line of pixels. The read address is alwayssmaller than write to avoid access collision. The selection of line buffer for writes is switchedevery horizontal sync signal in a circular fashion.

During read, a new pixel goes into the first register and the pixel from six clock cycles

59

7. SPATIAL TONAL MAPPING

Figure 7.2: Convolution with Binary Kernel - This diagram shows the operation of the imaging blurringusing the binary kernel.

ago in the fifth flip flop is replaced by the pixel five clock cycles ago. This produces an effectof a window shifting to the right of the frame while pixels within shift to the left. To producea down-shift effect of the window at every horizontal sync, the shift register associated tothe line buffer that is being written to, is used for the bottom row of the window; while theone associated to the immediate previously written line buffer is used for second bottom rowand so on. Together with the centers of the window, the constructed sub-frames are sent toconvolution module, where the image data is calculated and blurred.

7.3 Convolution and Edge Enhancement

Convolution module processes the window provided by window provisioning buffer. Asshown in Fig 7.2, it uses a two-dimensional Gaussian kernel to convolve the video input.However, it is very expensive to perform multiplication and division. Multipliers use exces-sive amounts of resources and also results a slower circuit. The current approach constructsa binary kernel and then uses shift operations as multipliers in order to dramatically reduceresource usage and provide a decent approximation of the actual result. Each element in thekernel is multiplied with a corresponding distribution and summed together to produce theconvolved output for the center pixel in order to blur the image. Instead of dividing eachkernel element by 256, the output is shifted 8 bits to the right to normalize the output.

The edge is calculated by subtracting the blurred pixel from the original pixel. Different

60

7.4 Resource Usage

Resources per InstanceModule Slice LUT Slice Reg Block RAM DSP

Window Provisioning Buffer 1260 865 8 0Convolution Circuit 681 752 0 0Color Space Converters 339 365 0 4

Table 7.1: Resource Usage of the Tonal Mapping Circuit.

edges are added back to the original images through multiple stages. For a tonal mapping cir-cuit that consists of two stages, there are two blurred images which can be used for extractingthe edges. These edges can then be weighted accordingly and added back to further enhanc-ing the image. The ideal amount of edge enhancement should produce a sharp-looking videothat is pleasant to the viewer.

7.4 Resource Usage

The resource usage from the synthesized circuit is given in Table 7.1. For a two-stage tonalmapping circuit, the number of window provisioning and convolution circuits instantiationis twice than that of a single-stage circuit. The resource usage of the color space convertersincludes both RGB-to-YCrCb and YCrCb-to-RGB conversions. The number of converterinstances does not change with the number of stages.

7.5 Summary

The tonal mapping circuit for HDRchitecture is implemented for viewing and evaluatingthe compositing result. For tonal mapping that uses local operator, the window provision-ing buffer can be used. Different kernels can be easily transformed to their normalizedbinary forms and then inserted into the convolution circuit. There are, however, some non-convolution-based methods that can be used for tonal mapping and these method are notimplemented on HDRchitecture. It is hard to design a general architecture on FPGA forevaluation of all possible tonal mapping algorithm thus a circuit has be to manually coded ifa new tonal mapping algorithm is going to be evaluated.

61

Chapter 8

Conclusion and Future Work

“The real voyage of discovery consists not in seeking new landscapes, but inhaving new eyes.” -Marcel Prouse

This thesis presented a parameterizable implementation for evaluating quantigraphicHDR compositing methods on FPGA. The generated circuits with different compositingmethods were prototyped on both small-sized FPGA (Spartanr-6) and medium-sized FPGA(Kintexr-7). As the prototype of three-frame HDR was firstly presented in conferences, itcreates a lot of attentions because a high quality HDR video is produced in real-time with acost-efficient system, as shown in Figure 8.1.

Figure 8.1: Real-Time HDR on FPGA Demonstration - This figure illustrates the demonstration of aminiaturized FPGA system for real-time HDR video.

62

8.1 Contribution

8.1 Contribution

The HDRchitecture implements a generalized framework for quantigraphic HDR composi-tion. Its hardware architecture is determined by the key process stages that are commonlyfound in HDR photography. The firmware of the capturing device used is modified for cap-turing videos with temporally alternating exposures. The HDR frame buffer implementationtakes in the raw video and splits it into videos that each contains a single exposure setting.The frame buffer interface is AXI4-compatible thus can be easily adapted into other Xil-inx FPGA boards. The composition circuit architecture ensures that for any given methodto recover the camera response function, the change in the weighted-sum circuit is easilymade by changing the lookup table initialization files. For more complicated methods, atwo-dimensional table lookup is optimized for resource by using the quadtree algorithm.The tonal mapping circuit uses a window-sliding sub-frame buffer to achieve multi-stageconvolution of size 5×5. The resulting video is displayable on a commercial 8-bit monitors.

8.2 Future Work

For most of the miniaturized sensor chip, the exposure is controlled by software that cannotkeep up the speed when exposure changes has to be made per frame. Thus these sensorscannot be used for producing a temporally alternating exposure in the video. In HDRchitec-

ture, a lot of assumptions on precision and compression were made based on the HDR videoquality. More information on numerical precision of this architecture need to be investigatedto make it a general function evaluator. Several improvements on multiplexer network hasbeen proposed and implemented without a significant reduction in resource usage. Sincethis is the most resource-intensive part of the circuit, more ideas on reducing the resourceneed to be considered. Although an FPGA implementation consumes more power than itsASIC counterpart, real-time HDR on FPGA provides a programmable prototyping platformfor HDR ASIC development. As a branch in a broader area of computational photography,the real-time HDR implemented in this thesis is able to produce HDR video stream with thedynamic range more than 20 stops. The current use of the system is targeting the industrialapplication. Towards an essentially self-contained miniaturized hardware camera system forpersonal imaging, more work need to be done so that it could be built into smaller eyeglassframes, for use in various wearable computing and augmediated reality applications, as wellas to help people see better in their everyday lives.

63

Appendix A

CCRF MUX Network with Nested

IF-ELSE Statements

The generated Verilog HDL for the simple multiplexer network shown in Figure 6.5 of Chap-ter 6 is given here as an example for illustration purposes. This implementation uses pre-order tree walk on the quadtree to generate nested IF-ELSE statements for constructing theaddressing circuit. The algorithm is given in Algorithm 1 below.

Algorithm 1 HDL Coder with Nested if-else

procedure HDLCODER1(Node, Bit)if if Node is not a leaf then . Addressing circuit

Print Condition 1HDLCODER1(Node→Child(right, up), Bit-1)Print “end”

Repeat for the other three children with the corresponding conditions. . .else . Interpolation circuit

Append the retrieved LUT entries at the end of each memory initialization fileIncrement address counter by 1

end ifend procedure

Here are all the conditions used above:Condition 1: “if (f1[Bit]== 1 && f2[Bit]== 1) begin”Condition 2: “else if (f1[Bit]== 0 && f2[Bit]== 1) begin”Condition 3: “else if (f1[Bit]== 0 && f2[Bit]== 0) begin”Condition 4: “else begin” . i.e., (f1[Bit]== 1 && f2[Bit]== 0)

64

1 ‘timescale 1ns / 1ps2 //==========================================================3 // File Name : addr_cct_nest.v4 //----------------------------------------------------------5 // Author : Tao Ai6 // Create Date : 02/01/20147 // Project Name : CCRF Quadtree Compression8 // Target Devices : XC7K325T-2FFG9009 // Tool versions : ISE 14.4 (64-bit)

10 //----------------------------------------------------------11 // Description : The Adressing Circuit12 // with Nested IF-ELSE Statements13 //----------------------------------------------------------14 // Comments : Created by Quadtree Software Coder15 //==========================================================16 module addr_cct_nest #(17 parameter DW = 4,18 parameter AW = 5,19 )(20 input [DW-1:0] p1,21 input [DW-1:0] p2,22 output reg [AW-1:0] addr23 );2425 //======================26 //Generate Muxes27 //======================28 always @(*) begin29 if(p1[3] == 1 && p2[3] == 1) begin30 addr <= 5’d0;31 end else if(p1[3] == 0 && p2[3] == 1) begin32 if(p1[2] == 1 && p2[2] == 1) begin33 if(p1[1] == 1 && p2[1] == 1) begin34 addr <= 5’d1;35 end else if(p1[1] == 0 && p2[1] == 1) begin36 addr <= 5’d2;37 end else if(p1[1] == 0 && p2[1] == 0) begin38 addr <= 5’d3;39 end else begin //(p1[2] == 1 && p2[2] == 0)40 addr <= 5’d4;41 end42 end else if(p1[2] == 0 && p2[2] == 1) begin43 addr <= 5’d5;44 end else if(p1[2] == 0 && p2[2] == 0) begin45 addr <= 5’d6;46 end else begin //(p1[2] == 1 && p2[2] == 0)47 addr <= 5’d7;48 end49 end else if(p1[3] == 0 && p2[3] == 0) begin50 if(p1[2] == 1 && p2[2] == 1) begin51 addr <= 5’d8;52 end else if(p1[2] == 0 && p2[2] == 1) begin53 addr <= 5’d9;54 end else if(p1[2] == 0 && p2[2] == 0) begin55 addr <= 5’d10;56 end else begin //(p1[2] == 1 && p2[2] == 0)57 if(p1[1] == 1 && p2[1] == 1) begin58 addr <= 5’d11;59 end else if(p1[1] == 0 && p2[1] == 1) begin60 addr <= 5’d12;61 end else if(p1[1] == 0 && p2[1] == 0) begin62 addr <= 5’d13;63 end else begin //(p1[2] == 1 && p2[2] == 0)64 addr <= 5’d14;

65

A. CCRF MUX NETWORK WITH NESTED IF-ELSE STATEMENTS

65 end66 end67 end else begin //(p1[3] == 1 && p2[3] == 0)68 if(p1[2] == 1 && p2[2] == 1) begin69 addr <= 5’d15;70 end else if(p1[2] == 0 && p2[2] == 1) begin71 addr <= 5’d16;72 end else if(p1[2] == 0 && p2[2] == 0) begin73 addr <= 5’d17;74 end else begin //(p1[2] == 1 && p2[2] == 0)75 addr <= 5’d18;76 end77 end78 end7980 endmodule

66

Appendix B

CCRF MUX Network with separated

Instantiations

The generated Verilog HDL for the simple multiplexer network shown in Figure 6.5 ofChapter 6 is given here as an example for illustration purposes. This implementation usespre-order tree walk on the quadtree to generate wires and multiplexer instantiations for con-structing the addressing circuit. The algorithm is given in Algorithm 2 below.

Algorithm 2 HDL Coder with Mux Instantiation

procedure HDLCODER2(Node, Bit, Wire)Instantiate MUX with wire names. . .

if if Node is not a leaf then . Addressing circuitGenerate new wire name “Wire new”if Its children are not leaves then

HDLCODER2(Child(right,up), Bit-1, Wire new)else

assign wires to pre-calculated addressIncrement address counter by 1

end if

Repeat for the other three children. . .else . Interpolation circuit

Append the retrieved LUT entries at the end of each memory initialization fileIncrement address counter by 1

end ifend procedure

67

B. CCRF MUX NETWORK WITH SEPARATED INSTANTIATIONS

1 ‘timescale 1ns / 1ps2 //==========================================================3 // File Name : addr_cct_inst.v4 //----------------------------------------------------------5 // Author : Tao Ai6 // Create Date : 02/01/20147 // Project Name : CCRF Quadtree Compression8 // Target Devices : XC7K325T-2FFG9009 // Tool versions : ISE 14.4 (64-bit)

10 //----------------------------------------------------------11 // Description : The Adressing Circuit12 // with Seperate MUX Instatiations13 //----------------------------------------------------------14 // Comments : Created by Quadtree Software Coder15 //==========================================================16 module addr_cct_inst #(17 parameter DW = 4,18 parameter AW = 5,19 )(20 input [DW-1:0] p1,21 input [DW-1:0] p2,22 output reg [AW-1:0] addr23 );2425 //======================26 //Generate Muxes27 //======================28 wire [AW-1:0] s_addr;29 always @(*) begin30 addr <= s_addr;31 end3233 wire [AW-1:0] s_m1l1c11, s_m1l1c01, s_m1l1c00, s_m1l1c10;34 mux #(.AW(AW)) u_mux1 (35 .i_sel({f1[15], f2[15]}),36 .i_data_11(s_m1l1c11),37 .i_data_01(s_m1l1c01),38 .i_data_00(s_m1l1c00),39 .i_data_10(s_m1l1c10),40 .o_data(s_addr)41 );42 assign s_m1l1c11 = 5’d0;43 wire [AW-1:0] s_m2l2c11, s_m2l2c01, s_m2l2c00, s_m2l2c10;44 mux #(.AW(AW)) u_mux2 (45 .i_sel({f1[14], f2[14]}),46 .i_data_11(s_m2l2c11),47 .i_data_01(s_m2l2c01),48 .i_data_00(s_m2l2c00),49 .i_data_10(s_m2l2c10),50 .o_data(s_m1l1c01)51 );52 wire [AW-1:0] s_m3l3c11, s_m3l3c01, s_m3l3c00, s_m3l3c10;53 mux #(.AW(AW)) u_mux3 (54 .i_sel({f1[13], f2[13]}),55 .i_data_11(s_m3l3c11),56 .i_data_01(s_m3l3c01),57 .i_data_00(s_m3l3c00),58 .i_data_10(s_m3l3c10),59 .o_data(s_m2l2c11)60 );61 assign s_m3l3c11 = 5’d1;62 assign s_m3l3c01 = 5’d2;63 assign s_m3l3c00 = 5’d3;64 assign s_m3l3c10 = 5’d4;

68

65 assign s_m2l2c01 = 5’d5;66 assign s_m2l2c00 = 5’d6;67 assign s_m2l2c10 = 5’d7;68 wire [AW-1:0] s_m4l2c11, s_m4l2c01, s_m4l2c00, s_m4l2c10;69 mux #(.AW(AW)) u_mux3 (70 .i_sel({f1[13], f2[13]}),71 .i_data_11(s_m4l2c11),72 .i_data_01(s_m4l2c01),73 .i_data_00(s_m4l2c00),74 .i_data_10(s_m4l2c10),75 .o_data(s_m1l1c00)76 );77 assign s_m4l2c11 = 5’d8;78 assign s_m4l2c01 = 5’d9;79 assign s_m4l2c00 = 5’d10;80 wire [AW-1:0] s_m5l3c11, s_m5l3c01, s_m5l3c00, s_m5l3c10;81 mux #(.AW(AW)) u_mux3 (82 .i_sel({f1[13], f2[13]}),83 .i_data_11(s_m5l3c11),84 .i_data_01(s_m5l3c01),85 .i_data_00(s_m5l3c00),86 .i_data_10(s_m5l3c10),87 .o_data(s_m4l2c10)88 );89 assign s_m5l3c11 = 5’d11;90 assign s_m5l3c01 = 5’d12;91 assign s_m5l3c00 = 5’d13;92 assign s_m5l3c10 = 5’d14;93 wire [AW-1:0] s_m6l2c11, s_m6l2c01, s_m6l2c00, s_m6l2c10;94 mux #(.AW(AW)) u_mux3 (95 .i_sel({f1[13], f2[13]}),96 .i_data_11(s_m6l2c11),97 .i_data_01(s_m6l2c01),98 .i_data_00(s_m6l2c00),99 .i_data_10(s_m6l2c10),

100 .o_data(s_m1l1c10)101 );102 assign s_m6l2c11 = 5’d15;103 assign s_m6l2c01 = 5’d16;104 assign s_m6l2c00 = 5’d17;105 assign s_m6l2c10 = 5’d18;106 endmodule

69

References

[1] MIR ADNAN ALI, TAO AI, AKSHAY GILL, JOSE EMILIO, KALIN OVTCHAROV,AND STEVE MANN. Comparametric HDR (High Dynamic Range) Imaging forDigital Eye Glass, Wearable Cameras, and Sousveillance. In Technology and Society(ISTAS), 2013 IEEE International Symposium on, pages 107–114. IEEE, 2013. 44, 46

[2] MIR ADNAN ALI AND STEVE MANN. Comparametric Image Compositing: Com-putationally Efficient High Dynamic Range Imaging. In Acoustics, Speech and Sig-nal Processing (ICASSP), 2012 IEEE International Conference on, pages 913–916.IEEE, 2012. 44, 47

[3] A. BARROS AND F. M. CANDOCIA. Image Registration in Range Using aConstrained Piecewise Linear Model. IEEE ICASSP, IV:3345–3348, May 13-17 2002. avail. at http://iul.eng.fiu.edu/candocia/Publications/Publications.htm. 1

[4] F. M. CANDOCIA. A Least Squares Approach for the Joint Domain andRange Registration of Images. IEEE ICASSP, IV:3237–3240, May 13-172002. avail. at http://iul.eng.fiu.edu/candocia/Publications/Publications.htm. 1

[5] F. M. CANDOCIA. Synthesizing a Panoramic Scene with a Common Exposure viathe Simultaneous Registration of Images. FCRAR, May 23-24 2002. avail. at http://iul.eng.fiu.edu/candocia/Publications/Publications.htm. 1

[6] ARNAUD DARMONT. HDR Imaging: Sensors and Architectures. SPIE Press, 2012.ISBN: 978-0-8194-8830-5. 11

[7] P. E. DEBEVEC AND J. MALIK. Recovering High Dynamic Range Radiance Mapsfrom Photographs. SIGGRAPH, pages 369–378, 1997. 12

70

http://iul.eng.fiu.edu/candocia/Publications/Publications.htm






REFERENCES

[8] EDUARDO SL GASTAL AND MANUEL M OLIVEIRA. Domain Transform forEdge-Aware Image and Video Processing. ACM Transactions on Graphics (TOG),30(4):69, 2011. 16, 22

[9] MIGUEL GRANADOS, BORIS AJDIN, MICHAEL WAND, CHRISTIAN THEOBALT, H-P SEIDEL, AND H LENSCH. Optimal HDR Reconstruction with Linear DigitalCameras. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Confer-ence on, pages 215–222. IEEE, 2010. 15

[10] MICHAEL D. GROSSBERG AND SHREE K. NAYAR. Determining the Camera Re-sponse from Images: What Is Knowable? IEEE Transactions on Pattern Analysisand Machine Intelligence, 25(11):1455–1467, 2003. 14

[11] ARM INC. 7-Series FPGAs Memory Resources - User Guide. 2014. http://www.xilinx.com. 33

[12] XILINX INC. Spartain-6 FPGA Block RAM Resources - User Guide. 2011. http://www.xilinx.com. 43

[13] XILINX INC. Spartain-6 FPGA DSP48A1 Slice - User Guide. 2014. http://www.xilinx.com. 43

[14] S.B. KANG, M. UYTTENDAELE, S. WINDER, AND R. SZELISKI. High DynamicRange Video. ACM Transactions on Graphics, 22(3):319–325, 2003. 2

[15] HOU-JEN KO, SHEN-FU HSIAO, AND WEN-LIANG HUANG. A New Non-UniformSegmentation and Addressing Remapping Strategy for Hardware-Oriented Func-tion Evaluators based on Polynomial Approximation. In Circuits and Systems(ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 4153–4156.IEEE, 2010. 48

[16] DONG-U LEE, WAYNE LUK, JOHN VILLASENOR, AND PETER YK CHEUNG. Non-Uniform Segmentation for Hardware Function Evaluation. In Field ProgrammableLogic and Application, pages 796–807. Springer, 2003. 48

[17] R.C.H. LO, S. MANN, J. HUANG, V. RAMPERSAD, AND T. AI. High DynamicRange (HDR) Video Image Processing for Digital Glass. In Proceedings of the 20thACM international conference on Multimedia, pages 1477–1480. ACM, 2012. 2, 4, 37

[18] COREY MANDERS, CHRIS AIMONE, AND STEVE MANN. Camera Response Func-tion Recovery from Different Illuminations of Identical Subject Matter. In ICIP,pages 2965–2968, 2004. 12

[19] S. MANN. Compositing Multiple Pictures of the Same Scene. In Proceedings of the46th Annual IS&T Conference, 2, 1993. 2

71

http://www.xilinx.com






REFERENCES

[20] S. MANN. Compositing Multiple Pictures of the Same Scene. In Proceedings ofthe 46th Annual IS&T Conference, pages 50–52, Cambridge, Massachusetts, May 9-141993. The Society of Imaging Science and Technology. ISBN: 0-89208-171-6. 8, 12

[21] S. MANN. Comparametric Equations with Practical Applications in Quanti-graphic Image Processing. IEEE Trans. Image Proc., 9(8):1389–1406, August 2000.ISSN 1057-7149. 2, 12

[22] S. MANN. Comparametric Equations with Practical Applications in Quanti-graphic Image Processing. Image Processing, IEEE Transactions on, 9(8):1389–1406, 2000. 14, 37

[23] S. MANN. Comparametric Transforms for Transmitting Eye Tap video with Pic-ture Transfer Protocol (PTP). In K.R. RAO AND PAT YIP, editors, Transformand Data Compression Handbook, chapter 4. CRC, September 27, 2000. ISBN:0849336929. 12

[24] S. MANN, R. LO, J. HUANG, V. RAMPERSAD, AND R. JANZEN. HDRchitecture:Real-Time Stereoscopic HDR Imaging for Extreme Dynamic Range. In ACM SIG-GRAPH 2012 Emerging Technologies, page 11. ACM, 2012. 2, 37

[25] S. MANN AND R. MANN. Quantigraphic Imaging: Estimating the camera re-sponse and exposures from differently exposed images. CVPR, pages 842–849,December 11-13 2001. 12

[26] S. MANN AND R.W. PICARD. Being ‘Undigital’ with Digital Cameras: ExtendingDynamic Range by Combining Differently Exposed Pictures. In Proc. IS&T’s 48thannual conference, pages 422–428, Washington, D.C., May 7–11 1995. Also appears,M.I.T. M.L. T.R. 323, 1994, http://wearcam.org/ist95.htm. 2, 12

[27] STEVE MANN. Joint Parameter Estimation in Both Domain and Range of Func-tions in Same Orbit of the Projective-Wyckoff Group. Number 384, Cambridge,Massachusetts, December 1994; http://hi.eecg.toronto.edu/icip96/index.html. Also ap-pears in: Proceedings of the IEEE International Conference on Image Processing(ICIP–96), Lausanne, Switzerland, September 16–19, 1996, pages III-193–196. 8

[28] STEVE MANN. Intelligent Image Processing. page 384. John Wiley and Sons,November 2 2001. ISBN: 0-471-40637-6. 43, 44

[29] STEVE MANN. Toposculpting: Computational Lightpainting and Wearable Com-putational Photography for Abakographic User Interfaces. In IEEE CCECE. IEEE,2014. 6

[30] STEVE MANN, RAYMOND CHUN, HING LO, KALIN OVTCHAROV, SHIXIANG GU,DAVID DAI, CALVIN NGAN, AND TAO AI. Realtime HDR (High Dynamic Range)Video for Eyetap Wearable Computers, FPGA-Based Seeing Aids, and Glasseyes.In in Proc. IEEE CCECE 2012. Citeseer, 2012. 1, 2, 3, 10, 16, 37

72

REFERENCES

[31] T. MITSUNAGA AND S. K. NAYAR. Radiometric Self Calibration. 1, pages 374–380, 1999. 13

[32] ANDREW S. NOETZEL. An Interpolating Memory Unit for Function Evaluation:Analysis and Design. Computers, IEEE Transactions on, 38(3):377–384, 1989. 46

[33] CHRIS PAL, RICHARD SZELISKI, MATTHEW UYTTENDAELE, AND NEBOJSA JOJIC.Probability Models for High Dynamic Range Imaging. In Computer Vision and Pat-tern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer SocietyConference on, 2, pages II–173. IEEE, 2004. 14

[34] BEHROOZ PARHAMI. Computer Arithmetic: Algorithms and Hardware Designs.Oxford University Press, Inc., 2009. 22, 48

[35] MARK A ROBERTSON, SEAN BORMAN, AND ROBERT L STEVENSON. DynamicRange Improvement through Multiple Exposures. In Image Processing, 1999. ICIP99. Proceedings. 1999 International Conference on, 3, pages 159–163. IEEE, 1999. 9,13

[36] MARK A ROBERTSON, SEAN BORMAN, AND ROBERT L STEVENSON. Estimation-Theoretic Approach to Dynamic Range Enhancement using Multiple Exposures.Journal of Electronic Imaging, 12(2):219–228, 2003. 2

[37] ASLA M SA, PAULO CEZAR CARVALHO, AND LUIZ VELHO. High dynamic rangeimage reconstruction. Synthesis Lectures on Computer Graphics and Animation,2(1):1–54, 2008. 11

[38] K. OVTCHAROC J. STEFFAN S. ZULFIQAR T. AI, M. A. ALI AND S. MANN. Real-Time HDR Video Imaging on FPGA with Compressed Comparametric LookupTables. In in Proc. IEEE CCECE 2012. IEEE, 2014. 3, 23, 44, 46, 56

[39] MICHAEL D. TOCCI, CHRIS KISER, NORA TOCCI, AND PRADEEP SEN. A VersatileHDR Video Production System. ACM Transactions on Graphics (TOG) (Proceedingsof SIGGRAPH 2011), 30(4), 2011. 2

[40] VESA. VESA Coordinated Video Timing Generator, Revision 1.1. 2003. http://www.vesa.org. 26, 54

[41] C.W. WYCKOFF. An Experimental Extended Response Film. SPIE Newslett, pages16–20, 1962. 2

[42] XILINX. Spartain-6 FPGA Memory Controller - User Guide. 2010. http://www.xilinx.com. 25, 31

[43] XILINX. AXI Reference Guide. 2011. http://www.xilinx.com. 20, 25

73

http://www.vesa.org

http://www.vesa.org




REFERENCES

[44] XILINX. 7 Series FPGAs Memory Interface Solutions - User Guide. 2012. http://www.xilinx.com. 25

[45] XILINX. AXI Interconnect (v1.06.a). 2012. http://www.xilinx.com. 36

[46] XILINX. Synthesis and Simulation Design Guide. 2012. http://www.xilinx.com. 50

[47] XILINX. 7 Series FPGAs CLB User Guide. 2013. http://www.xilinx.com.51

[48] XILINX. 7 Series FPGAs Overview. 2013. http://www.xilinx.com. 54

74








HDRchitecture: Real-Time Quantigraphic High …...Chapter 1 Introduction This thesis investigates...

Documents

Transcript of HDRchitecture: Real-Time Quantigraphic High …...Chapter 1 Introduction This thesis investigates...