Light Field based 360o Panoramas - ULisboa...tornam-se volumes (3D), alterando o paradigma...
Transcript of Light Field based 360o Panoramas - ULisboa...tornam-se volumes (3D), alterando o paradigma...
Light Field based 360o Panoramas
André Alexandre Rodrigues Oliveira
Thesis to obtaining the Master of Science Degree in
Electrical and Computer Engineering
Supervisor: Prof. Fernando Manuel Bernardo Pereira
Prof. João Miguel Duarte Ascenso
Prof. Catarina Isabel Carvalheiro Brites
Examination Committee
Chairperson: Prof. José Eduardo Charters Ribeiro da Cunha Sanguino
Supervisor: Prof. João Miguel Duarte Ascenso
Members of the Committee: Prof. Pedro António Amado Assunção
November 2016
ii
iii
Resumo
Os panoramas 360º oferecem experiências mais imersivas aos utilizadores uma vez que permitem uma
navegação mais livre e intuitiva no mundo visual 3D, nomeadamente em qualquer direção desejada.
Recentemente, este tipo de conteúdo tem sido cada vez mais usado em vários domínios de aplicação,
oferecendo aos utilizadores a oportunidade de compreender mais profundamente o mundo 3D sem
restrições a priori ou ocultando ângulos específicos de visão. Os panoramas 360º estimulam a interação
do utilizador levando a um número crescente de visualizadores e ao aumento do tempo total de
consumo. A criação de panoramas 360º é normalmente alcançada através de um processo de ‘costura’
(stitching) que combina várias imagens com alguma sobreposição dos respectivos de campos de visão.
Geralmente, os sensores presentes nas câmeras convencionais apenas capturam a soma total da
luz que incide numa dada posição da lente. No entanto, esta é uma representação limitada do campo
luminoso da cena real o qual pode ser mais fielmente expresso através de uma função que caracteriza
a quantidade de luz viajando em cada direção e através de cada ponto no espaço, denominada por
função plenóptica. Recentemente, novos sensores e câmeras designados plenópticos ou de campos
de luz surgiram com a capacidade de capturar representações com mais elevada dimensionalidade da
informação visual do mundo real, por exemplo usando matrizes de micro-lentes no caminho ótico para
capturar a luz que incide em cada posição espacial (x,y) segundo qualquer direção angular (𝜃, 𝛷). Esta
representação oferece imagens mais ricas e logo funcionalidades adicionais como, por exemplo, a
possibilidade de focar em qualquer parte da imagem após a sua captura, mudar ligeiramente o ponto
de vista do utilizador, re-iluminar e re-colorir, selecionar automaticamente objetos com base na
informação de profundidade, entre outras. Com estas novas câmeras de campos de luz, as imagens
tornam-se volumes (3D), alterando o paradigma convencional de representação de imagens que usa
superfícies planas (2D).
Naturalmente, a criação de panoramas 360º utilizando imagens de campos de luz em vez de
imagens convencionais é um caminho excitante a percorrer considerando as potenciais funcionalidades
adicionais e a constante necessidade de oferecer ao utilizador uma experiência mais intensa e
envolvente.
O principal objetivo desta Tese de Mestrado é o desenvolvimento de uma solução para criação de
imagens panorâmicas baseada em campos de luz, capaz de explorar o potencial das emergentes
câmeras de campos de luz para a produção e consumo de panoramas 360º. Este trabalho irá combinar
a criação, ‘costura’, processamento, manipulação, interação e visualização adequada de panoramas
360º baseados em campos de luz de forma amigável para o utilizador. Para alcançar o objetivo
pretendido, esta dissertação começa por rever, analisar e discutir as soluções convencionais de criação
de panoramas 360º mais importantes e representativas da literatura. Apesar de a investigação na área
de criação de panoramas 360º baseada em campos de luz estar ainda numa fase inicial, já existem
algumas soluções propostas na literatura. Desta forma, esta dissertação analisa e revê também duas
soluções representativas da criação de panoramas 360º baseada em campos de luz. De seguida, será
apresentada a solução proposta para a criação de panoramas 360º baseada em campos de luz.
Finalmente, será feita a avaliação do desempenho desta solução através da apresentação e análise de
alguns panoramas criados com a solução proposta.
Palavras-chave: fotografia digital, criação de panoramas 360º, stitching, função plenóptica, campos de
luz
iv
v
Abstract
360º panoramas bring more intense and immersive experiences to users since they support a free and
intuitive navigation in the 3D visual world, notably in any desired direction. Recently, this type of content
has been increasingly used in many application domains, providing the users the chance to deeply
understand the 3D world without a priori constraints or hiding specific viewing angles. 360° panoramas
stimulate user interaction leading to an increase of the viewership numbers and total media consumption
time. 360º panoramas creation is normally achieved by means of a stitching procedure which combines
multiple images with overlapping fields of view.
Generally, the sensors present in conventional cameras merely capture the total sum of the light
impinging in a given same position of the lens. This is clearly a limited representation of the real scene
light field which can be more faithfully expressed through a well-known function characterizing the
amount of light traveling in every direction and through every point in space, the so-called plenoptic
function. Recently, new sensors and light field or the so-called plenoptic cameras have emerged with
the capacity to capture higher dimensional representations of the world visual information, for example
using a micro-lens (i.e. lenslet) array in the optical path, which is able to capture the light for each spatial
position (x,y) coming from any angular direction (𝜃, 𝛷). This richer imaging representation offers
additional functionalities such as refocusing to any part of the image after the capture, slightly changing
the user viewpoint, relighting and recoloring, selecting objects automatically based on the depth
information, among others. With these new light field cameras, images became (3D) volumes, changing
the conventional imaging representation model that uses (2D) flat planes.
Naturally, the creation of 360º panoramas using light field images and not anymore conventional
images is an exciting path to pursue considering the potential additional functionalities and the constant
need to offer the user with more intense and immersive experience.
The main objective of this Master Degree Thesis is the development of a light field based 360º
imaging panorama creation solution able to exploit the potential of the emerging light field cameras for
360º panorama production and consumption. This work will combine the creation, stitching, processing,
manipulation, interaction and adequate visualization of light fields based 360º panoramas in a user
friendly way. To reach the intended objective, this dissertation starts with the review, analysis and
discussion of the most important and representative conventional 360º panoramas creation solutions in
the literature. While the research area on light fields based 360º panorama creation is still at its infancy,
some first solutions already exist in the literature. Thus, this dissertation also reviews and analyzes after
two representative solutions regarding light field based 360º panoramas creation. After the solution
proposed for light field based 360º panorama creation is presented. Finally, the performance of this
solution will be discussed through the presentation and analysis of some representative light field
panoramas created with the proposed solution.
Keywords: digital photography, 360º panorama creation, stitching, plenoptic function, light field
vi
vii
Table of Contents
Resumo ................................................................................................................................................... iii
Abstract.....................................................................................................................................................v
List of Figures ...........................................................................................................................................x
List of Tables ......................................................................................................................................... xiii
Acronyms ............................................................................................................................................... xiv
1. Introduction ...................................................................................................................................... 1
1.1. Context and Motivation ................................................................................................................ 1
1.2. Objectives and Structure ............................................................................................................. 2
2. State-of-the-Art on Conventional 360o Panoramas Creation .......................................................... 4
2.1. Proposing an Architecture for Conventional 360º Panorama Creation ....................................... 4
2.2. Types of 360º Panoramas ........................................................................................................... 8
2.3. Reviewing the Main Conventional 360º Panorama Creation Solutions .................................... 12
2.3.1. Solution 1: Panoramic Image Creation Combining Patch-based Global and Local Alignment
Techniques ............................................................................................................................................ 12
A. Objectives and Technical Approach .............................................................................................. 12
B. Architecture and Main Tools .......................................................................................................... 13
C. Performance and Limitations ......................................................................................................... 14
2.3.2. Solution 2: Panoramic Image Creation using Invariant Feature based Alignment and Multi-
Band Blending ....................................................................................................................................... 16
A. Objectives and Technical Approach .............................................................................................. 16
B. Architecture and Main Tools .......................................................................................................... 17
C. Performance and Limitations ......................................................................................................... 19
2.3.3. Solution 3: Panoramic Image Creation using Invariant Feature based Alignment and
Seamless Image Stitching ..................................................................................................................... 21
A. Objectives and Technical Approach .............................................................................................. 21
B. Architecture and Main Tools .......................................................................................................... 21
C. Performance and Limitations ......................................................................................................... 23
2.3.4. Solution 4: Panoramic Image Creation using a Locally Adaptive Alignment Technique based
on Invariant Features ............................................................................................................................. 25
A. Objectives and Technical Approach .............................................................................................. 25
viii
B. Architecture and Main Tools .......................................................................................................... 25
C. Performance and Limitations ......................................................................................................... 27
3. Light Field based 360º Panoramas Creation ................................................................................. 28
3.1. Basic Concepts .......................................................................................................................... 28
3.2. Reviewing the Main Light Field based 360 Panorama Creation Solutions ............................... 31
3.2.1. Solution 1: Light Field based 360º Panorama Creation using Invariant Features based
Alignment ............................................................................................................................................... 31
A. Objectives and Technical Approach .............................................................................................. 31
B. Architecture and Main Tools .......................................................................................................... 32
C. Performance and Limitations ......................................................................................................... 33
3.2.2. Solution 2: Light Field based 360º Panorama Creation using Regular Ray Sampling ......... 34
A. Objectives and Technical Approach .............................................................................................. 34
B. Architecture and Main Tools .......................................................................................................... 35
C. Performance and Limitations ......................................................................................................... 36
4. Light Field based 360º Panorama Creation: Architecture and Tools……………………………….39
4.1. Global System Architecture and Walkthrough ........................................................................... 39
4.2. Light Field Toolbox Processing Description .............................................................................. 42
4.3. Main Tools: Detailed Description ............................................................................................... 46
4.3.1. Central Perspective Images Registration Processing Architecture ....................................... 46
4.3.2. Composition Processing Architecture ................................................................................... 49
5. Light Field based 360º Panoramas Creation: Assessment ………………………………………….53
5.1. Test Scenarios and Acquisition Conditions ............................................................................... 53
5.1.1. Test Scenarios ....................................................................................................................... 53
5.1.2. Acquisition Conditions ........................................................................................................... 56
5.2. Example Results and Analysis .................................................................................................. 57
5.2.1. Perspective Shift Capability Assessment .............................................................................. 57
5.2.2. Refocus Capability Assessment ............................................................................................ 69
6. Summary and Conclusions…………………….……………………………………………………….78
6.1. Summary and Conclusions ........................................................................................................ 78
6.2. Future Work ............................................................................................................................... 79
Bibliography ........................................................................................................................................... 87
Appendix A………………………………………………………………………………………………………82
ix
x
List of Figures
Figure 1 – Light Field Cameras: Lytro (a) first and (b) second generation camera, respectively [1]; (c)
Raytrix camera [2]; .................................................................................................................................. 2
Figure 2 - Proposed architecture for the creation and interactive consumption of conventional 360º
panoramas. .............................................................................................................................................. 4
Figure 3 – The 3D sphere of vision displaying all the acquired images [14]. .......................................... 7
Figure 4 - Panorama projection impact and corresponding example: (a) and (b) Cylindrical projection
[14] [22] ; (c) and (d) Spherical projection [14] [22]. ................................................................................ 9
Figure 5 - Panorama projection impact and corresponding example: (a) and (b) Rectilinear projection
[14] [22]; (c) and (d) Fisheye projection [14] [22]; (e) and (f) Stereographic projection [14] [22]. ......... 11
Figure 6 - Panorama examples: (a) Sinusoidal projection; (b) Panini projection [22]. .......................... 12
Figure 7 – Architecture of the panoramic image creation solution combining pixel-based global and local
alignment techniques [23]...................................................................................................................... 13
Figure 8 – Mitigating misregistration errors by applying global alignment: (a) image mosaics with visible
gaps/overlaps; (b) corresponding image mosaics after applying the global adjustment technique; (c) and
(d) close-ups of left middle regions of (a) and (b), respectively [23]. .................................................... 15
Figure 9 – Mitigating the effect of motion parallax by applying local alignment: (a) image mosaic with
parallax; (b) image mosaic after applying a single deghosting step (patch size of 32); (c) image mosaic
after applying three times deghosting steps (patch sizes of 32, 16 and 8) [23]. ................................... 16
Figure 10 - Architecture of the invariant feature based automatic panoramic image creation solution. 17
Figure 11 – Recognizing panorama capability: (a) image collection containing connected sets of images
that will later form different panoramas and noise images; (b) 4 different blended panoramas outputted
by the panorama creation solution [24]. ................................................................................................ 19
Figure 12 – Panoramas produced: (a) without applying gain compensation technique; (b) with gain
compensation technique; (c) with both gain compensation and multi-band blending technique [24]. .. 20
Figure 13 - Architecture of the invariant feature based seamless HDR panorama creation solution [30].
............................................................................................................................................................... 22
Figure 14 - HDR panorama creation [30]: (a) Registered input images; (b) Results after applying the first
step of image selection: reference labels (left), resulting panoramic image (center) and tone-mapped
version of the panoramic image created (right); (c) Results after applying the second step of image
selection: final reference label (left), HDR panorama (center) and tone-mapped version of the HDR
compressed panorama (right). .............................................................................................................. 24
Figure 15 – Architecture of panoramic image creation solution using a locally adaptive alignment
technique based on invariant features. ................................................................................................. 25
xi
Figure 16 – Panoramas created with the Autostich and APAP solutions [36]. ..................................... 27
Figure 17 – Illustrating the plenoptic function [37]. ............................................................................... 29
Figure 18 – Light field cameras and imaging acquisition system: (a) Lytro Illum camera [1]; (b) Raytrix
camera [2]; (c) imaging acquisition system [39]; (d) micro images formed behind the micro-lens array.
............................................................................................................................................................... 30
Figure 19 – Architecture of the creation and interactive consumption of the light field based panoramic
image creation using invariant features based alignment. .................................................................... 32
Figure 20 – Panoramic Image created showing the regions of overlap between the all-in-focus grayscale
images [40]. ........................................................................................................................................... 33
Figure 21 - Architecture of the light field based panoramic image creation solution using regular ray
sampling. ............................................................................................................................................... 35
Figure 22 - Panoramas created with the AutoStitch solution and the light field based 360º panorama
creation solution reviewed in this section [42]. ...................................................................................... 37
Figure 23 – Illustration of the stitching process of light field images (represented as perspective images
stacks). .................................................................................................................................................. 40
Figure 24 – Global system architecture of the proposed light field based 360º panorama creation
solution. ................................................................................................................................................. 40
Figure 25 - Lytro Illum light field camera: (a) GRBG Bayer-pattern filter mosaic [49]; and (b) imaging
acquisition system [50]. ......................................................................................................................... 41
Figure 26 - Light Field Toolbox software: light field images processing architecture............................ 43
Figure 27 - Hexagonal micro-lens array: close up [52]; and (b) example of a white image and associated
estimated lenslet centers represented as red dots [51]. ....................................................................... 43
Figure 28 - Example of: (a) calibration pre-processed light field checkerboard image; (b) checkerboard
corners identification [51]....................................................................................................................... 44
Figure 29 - Example of an image: (a) before and (b) after demosaicing [53]. ...................................... 45
Figure 30 - Example of a demosaiced raw lenslet image before devignetting [43]............................... 45
Figure 31 – Central perspective images registration architecture of the proposed light field based 360º
panorama creation solution. .................................................................................................................. 47
Figure 32 – Features detected and extracted from 2 overlapping central perspective images. ........... 47
Figure 33 – Image Matching after applying RANSAC algorithm (inlier matches). ................................ 48
Figure 34 – Wave correction examples: (a) without and (b) with applying the panorama straightening
technique. Both examples presented are the final panorama that was obtained after all composition
steps. ..................................................................................................................................................... 49
xii
Figure 35 – Composition architecture of the proposed light field based 360º panorama creation solution.
............................................................................................................................................................... 50
Figure 36 - Image warping example: (a) before (b) after applying image warping in a central perspective
image. .................................................................................................................................................... 51
Figure 37 – Image mask example. ........................................................................................................ 51
Figure 38 – Central view for the Room with toys 1 fight field 360º panorama corresponding to test
scenario A.1. .......................................................................................................................................... 54
Figure 39 – Central view for the Room with toys 2 fight field 360º panorama corresponding to test
scenario A.2. .......................................................................................................................................... 54
Figure 40 – Light field 270º panoramas corresponding to test scenario B.3: (a) Sea landscape; and (b)
Park landscape. ..................................................................................................................................... 55
Figure 41 – Central view for the Empty Park fight field 300º panorama corresponding to test scenario
C.3. ........................................................................................................................................................ 55
Figure 42 – Full acquisition system used. ............................................................................................. 56
Figure 43 – Light field panorama presented as a 2D matrix of perspective panoramic images. .......... 58
Figure 44 – Extreme left perspective panorama example (position (8,1)) with undesired effects (such as
vignetting and blurring): (a) perspective panorama located at the border of the perspective panoramas;
(b) first and (c) second perspective images (extracted from the first and second acquired light field
images belonging to the presented light field panorama) used to compose the presented perspective
panorama............................................................................................................................................... 59
Figure 45 – Five perspectives extracted from the Room with toys 1 light field 360º panorama created for
the test scenario A.1: (a) central perspective (8,8); (b) left perspective (8,3); (c) right perspective (8,13);
(d) top perspective (2,8); and (e) bottom perspective (14,8). ................................................................ 60
Figure 46 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the
left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8);
lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13). ............................ 61
Figure 47 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top
perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly
(c) and (f) correspond to the two close-ups from the bottom perspective (14,8). ................................. 62
Figure 48 - Perspective extracted from the Room with toys 2 light field 360º panorama created for the
test scenario A.2: (a) central perspective (8,8); (b) and (c) two close-ups presenting camera
overexposure problems and 2D stitching artifacts. ............................................................................... 64
Figure 49 – Five perspectives extracted from the Sea landscape light field 270º panorama created for
the test scenario B.3: (a) central perspective (8,8); (b) left perspective (8,3); (c) right perspective (8,13);
(d) top perspective (2,8); and (e) bottom perspective (14,8). ................................................................ 66
xiii
Figure 50 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the
left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8);
lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13). ............................ 67
Figure 51 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top
perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly
(c) and (f) correspond to the two close-ups from the bottom perspective (14,8). ................................. 68
Figure 52 – Three depth planes extracted from the Room with toys 1 light field 360º panorama and two
corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope = - 0.05
where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.25 where
(f) and (g) are the corresponding close-ups; (c) depth plane extracted with slope = 0.6 where (h) and (i)
are the corresponding close-ups. .......................................................................................................... 71
Figure 53 - Last depth plane extracted from the Room with toys 2 light field 360º panorama and two
corresponding close-ups: (a) depth plane extracted with slope = 0.6 where (b) and (c) are the
corresponding close-ups. Red rectangles highlight the close-ups that will be used to help visualizing the
focus in specific parts of the light field image. ....................................................................................... 72
Figure 54 - Three different depth planes extracted from the Sea landscape light field 270º panorama
and two corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope
= 0.15 where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.45
where (f) and (g) are the corresponding close-ups; (c) depth plane extracted with slope = 0.55 where
(h) is the corresponding close-up. ......................................................................................................... 74
Figure 55 - Three different depth planes extracted from the Park landscape light field 270º panorama
and two corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope
= 0 where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.15
where (f) and (g) are the corresponding close-ups; (c) depth plane extracted with slope = 0.25 where
(h) is the corresponding close-up. ......................................................................................................... 76
Figure 56 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the
left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8);
lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13). ............................ 83
Figure 57 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top
perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly
(c) and (f) correspond to the two close-ups from the bottom perspective (14,8). ................................. 84
Figure 58 - Two different depth planes extracted from the Empty park light field 300º panorama and two
corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope = 0.25
where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.5 where (f)
and (g) are the corresponding close-ups. .............................................................................................. 85
xiv
List of Tables
Table 1 – Test scenarios characteristics. .............................................................................................. 54
xv
Acronyms
2D Two Dimensional
3D Three Dimensional
4D Four Dimensional
7D Seven Dimensional
APAP As-Projective-As-Possible
CMOS Complementary Metal-Oxide Semiconductor
DLT Direct Linear Transformation
EXIF Exchangeable Image File Format
FOV Field of View
GRBG Green Red Blue Green
HDR High Dynamic Range
JSON Java Script Object Notation
LF Light Field
LFT Light Field Toolbox
LMS Least Median of Squares
PNG Portable Network Graphics
RANSAC RANdom SAmple Consensus
RGB Red Green Blue
RMSE Root Mean-Squared Error
SIFT Scale-Invariant Feature Transform
SNR Signal-to-Noise Ratio
SURF Speeded Up Robust Features
xvi
Chapter 1
1. Introduction
This chapter will introduce the topic of this Thesis, which is the creation of light field based 360º
panoramas. In this context, this chapter begins by highlighting the context and motivation behind this
work, proceeding to defining its objectives and, finally, the structure of this document.
1.1. Context and Motivation
Photography is the process of recording visual information by capturing light rays on a light-sensitive
recording medium, e.g. film or digital sensors. The result of this process, the images, is one of the most
important communication medium for human beings, largely employed in a variety of application areas,
from art and science to all types of businesses. Over the years, photography related methods and
gadgets, e.g. digital cameras, have emerged to become more refined and better suit the increasing user
needs. However, the most common photography cameras developed up today present an important
limitation: whether analog or digital, they have a limited field of view (i.e. the part of the visual world that
is visible through the camera at a particular position and orientation in space), which is in general much
smaller than the human field of view. In summary, it is not an easy task to encompass wide fields of
view in a single camera shot.
With the desire to capture, in a single image, wide fields of view, panoramic photography has
emerged as a technique that combines a set of partially overlapping elementary images of a visual
scene acquired from a single camera location to obtain an image with a wide field of view. Contrary to
what might be thought, this type of photography is not so recent; however, with the emergence of digital
cameras, a number of new possibilities have been opened for panoramic image creation.
Conventionally, 360º panoramas creation is a process involving a sequence of different steps, starting
with the acquisition of several images representing different parts of the scene and ending with a
stitching process that combines the multiple overlapping images, resulting in the desired panorama.
However, this procedure suffers from several limitations associated to the conventional imaging
representation paradigm used where the images are just a collection of rectangular 2D projections for
some specific wavelength components. Generally, conventional cameras can merely capture the total
sum of the light rays that reach a certain point in the lens using the two available spatial dimensions at
2
the camera sensor (vertical and horizontal resolutions), thus leading to loss of valuable information from
the real scene light field. This valuable information is the directional distribution of the light of the scene,
which can be used in a number of different ways to improve the creation of panoramas and also to
provide several new functionalities to the users. Without the use of this precious visual information, the
user experience becomes greatly restricted and limited. The full real scene light field can be fully
expressed by a well-known function characterizing the amount of light traveling through every point in
space, in every direction, for any wavelength, along time, the so-called (7D) plenoptic function. For this
reason, the search for novel, more complete representations of the world visual information (i.e. higher
dimensional representations) has become a hot research field, in a demand to offer the users more
immersive, intense and faithful experiences.
Recently, with the emergence of new sensors and cameras, e.g. the plenoptic or light field cameras
illustrated in Figure 1, allowing higher dimensional representations of the visual information, by capturing
the angular light information, these problems have started to be addressed. These new cameras allow
to conveniently capture and record more light information since they have an innovative design where
a micro-lens array allows to capture the light for each position and from each angular direction, i.e. they
capture and record a 4D light field. Each of these micro-lenses captures a slightly different
perspective/view from the scene, allowing these cameras to record not only the location and intensity of
light, as a conventional digital camera does, but also to differentiate and record the light intensity for
each incident direction.
(a) (b) (c)
Figure 1 – Light Field Cameras: Lytro (a) first and (b) second generation camera, respectively [1]; (c) Raytrix camera [2];
This important characteristic allows capturing a much richer visual information representation of the
scene, which can be used to overcome the limitations related to the conventional imaging representation
and capture. For instance, this richer representation of the visual information brings additional interesting
features like the ability to a posteriori refocus any part of the image, relighting and recoloring, slightly
changing the user viewpoint, among others. All these new capabilities will inevitably lead to the
reinvention of concepts and functionalities associated with digital photography, and thus the 360º
panoramic image creation solutions will also suffer the impact of this new imaging representation
paradigm and associated available cameras.
1.2. Objectives and Structure
In the context described above, the main objective of this Master Thesis is the design, implementation
and assessment of a novel light field imaging based 360º panorama creation solution based on the best
available technologies. To achieve its goal, this dissertation starts by reviewing and analyzing the most
relevant conventional 360º panoramas creation solutions in the literature, after proposing a basic
3
architecture for this process. After reviewing the panorama creation solutions for conventional imaging,
this dissertation will move to the light field representation paradigm based on the plenoptic function.
Even though the research on light fields based 360º panorama creation is still at an early stage, some
first solutions already exist in the literature. Thus, this dissertation will review first the few available
solutions regarding light field based 360º panoramas creation. Finally, the Thesis will address the
design, implementation and assessment of a powerful light field based 360º panorama solution precisely
extending the available (non-light field based) methods and software.
To report the work developed, this dissertation is structured as follows: Chapter 2 will review the
state-of-the-art on conventional 360º panorama creation by, firstly, proposing a global architecture for
360o panorama creation, proceeding after to a presentation of the most common panorama types and,
lastly, the reviewing of the most relevant conventional 360º panorama creation solutions in the literature.
Then, Chapter 3 will review the state-of-the-art on light field based panorama creation by, firstly,
presenting the basic concepts behind the light field representation paradigm and then reviewing two
available light field based panorama creation solutions. Next, Chapter 4 will present the light field based
360º panorama creation solution proposed, starting by describing the global system architecture and
walkthrough, followed by a detailed description of the main parts, namely the light field data pre-
processing module and the key modules used to create the 360º panorama light field image. Then,
Chapter 5 will introduce the performance analysis of the developed panorama creation solution, starting
by presenting the test scenarios and adopted acquisition conditions, moving after to the visual inspection
and analysis of some representative light field panoramas. Lastly, Chapter 6 will conclude with a
summary and the future work plan.
4
Chapter 2
2. State-of-the-Art on Conventional 360o Panoramas
Creation
This chapter will present the main concepts, approaches and tools involved in the creation of
conventional 360º panoramas also known as full-view panoramas. With this goal in mind, it starts by
presenting a global architecture for the creation of 360º panoramas, proceeding after to the presentation
of several types of 360º panoramas and their characteristics. Finally, a brief review of some of the main
conventional panorama creation solutions available in the literature is presented.
2.1. Proposing an Architecture for Conventional 360º Panorama Creation
Creating a full-view 360º panorama is a complex task that involves a series of steps, since the acquisition
of image data from the scene until the final generation of a seamless 360º panorama to be experienced
through a rendered view which position is interactively selected by the user. The first target of this
chapter is to propose a global architecture for the creation of 360º panoramas designed to embrace the
main approaches available in the literature. The proposed architecture is presented in Figure 2:
Figure 2 - Proposed architecture for the creation and interactive consumption of conventional 360º panoramas.
In the following, a brief description on the various modules present in the proposed architecture is
presented:
Image Acquisition: This first step regards the acquisition of all images representing the 3D world
scene. In this architecture, it is considered that the camera performing the acquisition stays in the
same position while rotating around its nodal point or no-parallax point; this requires that the
Image
AcquisitionCalibration
3D World
ScenePre-Processing
Rendered
View
Final 360º
PanoramaProjection Blending
Global
AlignmentRendering
Motion
Based
Alignment
Feature
Detection
and
Extraction
Feature
Matching
Direct (pixel-based)
Registration
Feature-based Registration
5
camera is carefully mounted on a tripod or hand held levelly at a chosen stable position and stays
in the same position throughout the acquisition of all images. One image is taken for each rotation
of the camera, so that the final set of images covers the full scene but also has fixed overlapping
areas, which highly facilitates their future alignment and makes possible to produce a full-view
panorama with all the image contents fitted into a single frame [3].
Pre-Processing: In this step, some pre-processing of the acquired images may be needed, e.g.
to minimize differences between the used camera-lens combination and an ideal lens model with
the goal of correcting some optical defects such as distortion and different exposures between
the images [4].
Calibration: In this step, some important calibration data is extracted, notably the camera intrinsic
and extrinsic parameters [3], which are computed based on the acquired images/textures. The
intrinsic or internal camera parameters allow a mapping between the coordinates of each image
point and the corresponding coordinates in the camera reference frame (relying only on camera
characteristics). The extrinsic or external camera parameters relate the orientation and location
of the camera center with a known world reference coordinate system [5].
Registration: In this step, the set of acquired images is registered meaning that they are brought
into one single coordinate system and thus all images put together with corresponding
overlappings. There are two main types of techniques in the literature to perform the image
registration process:
- Direct (pixel-based) techniques: These solutions directly minimize the pixel-to-pixel
intensity dissimilarities to align the images. The main advantage of the direct (pixel-based)
techniques is that they make optimal/full use of the information available for the image
alignment process since they consider the contribution of every single pixel from all images;
their main disadvantage is that they have a limited range of convergence (they also need to
be initialized), implying that for photo-based panoramas they fail too often to be useful [6].
These techniques typically involve motion based alignment where a suitable error metric is
used to compare the pixels intensities in all images and a convenient search technique is
used to find the alignment where the most pixels agree. Generally speaking, there are two
ways of performing the alignment search: i) the first is to extensively try all possible
alignments, i.e., to perform a full search [6]; and ii) the alternative, faster way is to perform a
hierarchical coarse-to-fine search, i.e. a hierarchical motion estimation [7]. There are many
techniques for performing the image registration, notably by using methods based on Fourier
analysis [7]. In general, for panoramic applications, high accuracy is requested in the
alignment process to obtain acceptable results; thus, it is necessary to use sub-pixel precision
by adopting incremental methods, e.g. based on the Taylor series expansion [6].
- Feature-based techniques: These methods work by extracting a sparse set of interesting
points in each image and matching these points after to equivalent points in other images in
the collection [8]. The main advantage of feature-based techniques is that they are
computationally more efficient than the direct techniques and have a better range of
convergence without the need for initialization. The main disadvantage is that they have to
6
deal with regions in the images that do not fit well in the motion model selected to match these
points of interest in an image with similar points present in other images due to either moving
objects in the scene [9] or parallax differences, among others [6]. Registration solutions based
on features typically consider two main steps:
Feature Detection and Extraction: In this step, the goal is to detect and extract a set of
distinctive local features for previously detected keypoints (or points of interest) in each
image. Local features can be defined as distinctive parts of an image, like edges, corners
or blobs (i.e. regions of interest); it is desirable that these features are present in the
highest number of images for an easier alignment [8]. To extract those features, a
keypoint detector is first employed, which corresponds to a low-level image processing
operator that examines every pixel to check if there is a good feature present at that pixel.
The most important property for a keypoint detector is its representability, meaning that
keypoints should correspond to positions with high image expressive power. Some other
desirable characteristics for keypoint detectors include invariance to image noise, scale,
rotation and translation, affine transformations and blur. After finding the set of distinctive
keypoints using keypoint detectors, local feature descriptors are used to describe the
texture on a patch (in general a square) defined around the corresponding keypoint [6].
The most common local feature descriptors are the Scale-Invariant Feature Transform
(SIFT) [10] and Speeded Up Robust Features (SURF) [11], which offer scale and rotation
invariant properties. Feature descriptors must also be robust to small deformations or
keypoint localization errors and allow finding the corresponding pixel locations in other
images which capture the same information about spatial intensity patterns under
different conditions or perspectives [6].
Feature Matching: This step performs a match between the set of features detected and
extracted in the previous step in various images. With this goal in mind, it is necessary to
determine which features correspond to the same locations in different images and then
determine the appropriate mathematical model (i.e. estimate the homography) which
relates the features found in a given image with the corresponding features in another
image. After finding this model, it is expected that some features do not fit well in this
model, so they are classified as outlier features in opposition to the features fitting well in
the model, called inlier features; the outlier features are removed from the model. Finally,
the matching between the neighboring images is performed using the inlier features [6].
The two main methods to perform the selection between inlier and outlier features and
the appropriate matching models are RANSAC (RANdom SAmple Consensus) [12] and
LMS (Least Median of Squares) [13].
Global Alignment: In this step, the goal is to find a generally consistent set of alignment
parameters that reduce the accumulated registration errors between all pairs of images, thus
obtaining an optimally registered panoramic image. Generally, in most panoramic applications, it
is necessary to register more than just a pair of images and thus it is fundamental to extend the
pairwise matching criteria used to establish an alignment between a simple pair of images to a
7
global energy function that involves all images and their pose parameters (parameters coming
from the calibration step) [6]. For the creation of a 360º panorama, it is necessary to find the
precise location of all the acquired images in a single coordinate system which define the 3D
sphere of vision as shown in Figure 3.
Figure 3 – The 3D sphere of vision displaying all the acquired images [14].
To make this happen, every pixel in each image will be represented in some spherical coordinates
(yaw, pitch and roll) and thus associated to a position in the 3D sphere of vision surface. The most
relevant technique to adjust the pose parameters for a collection of overlapping images, thus
finding the optimal overall registration, is called bundle adjustment [15]. After combining multiples
images of the same scene into the 3D sphere of vision, it may be necessary to perform local
adjustments such as parallax removal to reduce double images and blurring due to local
misregistration [6]. If the acquisition provides an unordered set of images to register, it is
imperative to identify which images have overlapping areas with each other’s and fit them together
to form one or more different panoramas using a process called recognizing panoramas [16].
Projection: This step will project the aligned images into a final composing surface which
depends on the type of selected panorama. Section 2.2 will introduce the main types of available
360º panoramas which depend on the choice of the projection. In this process, each image is
successively projected according to the chosen composing surface coordinates, i.e. the mapping
between each source image pixels and the same composite surface is performed, giving to the
360º panorama their final visual signature [6].
Blending: Since there are typically overlapping areas and possibly moving objects, this step will
define the pixel values in each position of the final panorama, notably how to optimally weight or
blend the pixels in such a way that visible seams (due to exposure differences), blurring (due to
misregistration) and ghosting (due to moving objects) effects are minimized [6]. This final step
provides the 360o panorama which should allow after to render attractive looking views.
Rendering: In this final step, the created panorama is rendered with appropriate tools to create
a virtual tour, this means an appropriate view for each user viewing direction. This is possible by
projecting the created 360º panorama into a spherical grid or map representing the 360º sphere
of vision around a perspective point where the user’s eyes are positioned [17]. The user has the
possibility to interact with the panorama, e.g. using his/her mouse by rotating it in all directions,
to navigate through the whole scene, making zoom-ins and zoom-outs, etc. to enjoy an interactive
user experience.
8
2.2. Types of 360º Panoramas
Panoramas are basically defined by the type of projection used for their creation; the projection is one
of the modules included in the architecture proposed in the previous section. There are many different
types of panoramas, some of them full-view 360º panoramas and others only considering a limited part
of the 3D sphere of vision. Each of these different types of panoramas is characterized by a specific
geometric surface associated to the projection transformation equations, e.g. rectilinear panoramas
among many others. The different projections, also called transformations, that can be applied to the
set of acquired images produce a variety of available panorama types [18]. Each of the projections has
its specific field of view (FOV) corresponding to the extent of the visible world that may be presented to
the user when interacting with the resulting panorama.
The panoramic projections differ from each other both in terms of its mathematical definition as well
as its panorama characteristics, thus each type has its specific attributes and limitations [18]. Naturally,
there is some inevitable distortion when mapping the 3D sphere of vision onto a 2D flattened image.
This happens because, with the increase of the viewing angle, the viewing arc becomes more curved;
this effect is also where the differences between the various panorama projection types become more
evident. Typically, each panorama projection type tries to reduce one type of distortion at the expense
of other types of distortions, thus the decision of what panorama projection should be used largely
depends on the application scenario [19].
Although there is a large diversity of panorama types, the reality is that there are just a few which are
rather popular. There are also some more complex panorama types with some additional properties,
e.g. resulting from the combination of two or more basic panorama types. Some of the most common
projections and thus panorama types are:
Rectangular projections: In this projection, the horizontal distance is proportional to the
horizontal viewing/rotation angle or yaw angle (the horizontal field of view is an angle up to 360º)
and the vertical distance is proportional to the vertical viewing angle this means the angle from
below to above the horizon or pitch angle. There are several types of rectangular projections,
notably:
- Cylindrical projection: This type of projection results from involving the 3D sphere of
vision with a 2D flat plan, tangent to the equator. At the same time, a light is projected from
the center of the sphere to the outside. The cylindrical projection enables to produce
panoramas with a large latitude range (vertical viewing range), e.g. larger than 120º.
Although it can cover larger latitudes, e.g. up to 360º, near the poles the panorama
becomes much distorted, making a large range of latitudes not really usable; in practice,
the maximum range is typically 180º for both FOV [20]. This is related to the fact that this
projection shows all the vertical straight lines in the scene as straight lines in the final
panorama [21]. Figure 4(a) shows the impact produced by this projection on a globe and
Figure 4(b) shows an example of this type of panorama.
- Spherical projection (or equirectangular projection): This type of projection transforms
all the points in the 3D sphere of vision into latitude and longitude coordinates which are
9
directly converted into horizontal and vertical coordinates in a 2D flattened image. This
projection preserves all vertical lines and converts the horizon into a straight line across
the middle of the image (but does not preserve the remaining horizontal lines). The north
and south poles of the 3D sphere of vision are stretched across the entire width of the 2D
flattened resulting image [20]. The maximum FOV for this projection is up to 360º both in
the vertical and horizontal directions [21]. Figure 4(c) shows the impact produced by this
projection on a globe and Figure 4(d) shows an example of this type of panorama.
- Mercator projection: This type of projection represents a trade-off between the
Cylindrical and Spherical projections as it provides less vertical stretching and a larger
usable vertical FOV than the Cylindrical projection but shows more line curvature [19]. This
projection can be used up to 360º FOV horizontally and up to 180º FOV vertically. One of
the variations of this type of panorama is the Transverse Mercator projection which
corresponds to the 90º rotation of the traditional Mercator projection; this projection is
appropriate for very tall vertical panoramas [19].
(a) (b)
(c) (d)
Figure 4 - Panorama projection impact and corresponding example: (a) and (b) Cylindrical projection [14] [22] ; (c) and (d) Spherical projection [14] [22].
Azimuthal projections: These projections are characterized by rotational symmetry around
the center of the image and may take several forms:
- Rectilinear projection: This type of projection can be imagined as placing a flat 2D plan
tangent to the 3D sphere of vision at a single point, and projecting the light from the
sphere’s center. It has the property of preserving all the straight lines in the real 3D space
into the final projected panoramic image. The maximum FOV for this type of projection is,
for both directions, an angle up to 180º, making it inappropriate for images with very large
angles of view [21]. For the large angles of view, it can exaggerate the perspective of the
objects in the panorama, which appear distorted at the edges [19]. Figure 5(a) shows the
10
impact produced by this projection on a globe and Figure 5(b) shows an example of this
type of panorama. There is a sub-class of this projection type called Cubic projection that
organizes the images like the faces of a cube (90º x 90º FOVs) viewed from its center,
thus maintaining all straight lines straight [20].
- Fisheye projection: This type of projection creates a 2D flattened grid where the distance
from the center of the image to a certain point is proportional to the viewing angle; this
implies that straight lines become more curved as they move away from the center of the
final panorama. One of the limitations is that both the vertical and horizontal FOVs must
be 180º or less to fit the projected image into a circle [19] [21]. Figure 5(c) shows the impact
produced by this projection on a globe and Figure 5(d) shows an example of this type of
panorama.
- Stereographic projection (or little planet projection): This type of projection can be
used to create the illusion of a ‘little planet’ as it corresponds to a projection of the 3D
sphere of vision as seen from the pole onto the 2D flat plan. It is very similar to the Fisheye
projection. However, the distance to the center of the image is not equivalent to the spatial
angle, thus offering a better sense of perspective [20]. Large FOV show the same
perspective-exaggerating characteristic as in the rectilinear projection but less
pronounced. This type of projection has a maximum FOV of 360º, in both directions, and
does not preserve either the horizontal or vertical lines [21]. Figure 5(e) shows the impact
produced by this projection on a globe and Figure 5(f) shows an example of this type of
panorama.
- Equisolid projection: This type of projection is similar to a ‘mirror ball’ where the straight
lines passing close to the center are maintained but they become more curved when
approaching the boundaries of the panorama. Although the field of view can go close to
360º, the image is circularly limited at the edges, making it ideal when the distortion is not
critical [20].
(a) (b)
11
(c) (d)
(e) (f) Figure 5 - Panorama projection impact and corresponding example: (a) and (b) Rectilinear projection [14] [22];
(c) and (d) Fisheye projection [14] [22]; (e) and (f) Stereographic projection [14] [22].
Other more complex projections based on some of the previously presented projections are:
- Sinusoidal projection: This type of projection aims to guarantee equal areas throughout
all sections of the image, making possible to flatten the 3D sphere of vision and rolling it
back up again to the original sphere. This is similar to the Fisheye and Stereographic
projections. This characteristic is useful as it facilitates the projection of 3D sphere of vision
to a 2D plan while maintaining the resolution in all axis throughout the image, which results
into perfect horizontal latitude lines [19]. The maximum FOV for this type of projection is
360º in the horizontal direction and 180º in the vertical direction; it does not preserve either
horizontal or vertical lines [21]. Figure 6(a) shows an example of this type of panorama.
- Panini projection (or Vedutismo panorama): This type of projection maintains the
vertical lines vertical and the radial lines straight but displays the original horizontal straight
lines as curves; it offers a sense of correct perspective for wide angles of view with a single
central vanishing point. Straight lines which do not pass through the center will become
curved [20]. Figure 6(b) shows an example of this type of panorama.
12
(a) (b)
Figure 6 - Panorama examples: (a) Sinusoidal projection; (b) Panini projection [22].
2.3. Reviewing the Main Conventional 360º Panorama Creation Solutions
Conventional 360º panoramas capture the whole FOV around the point where the image collection is
acquired allowing an interactive user experience, e.g. through navigation over the whole scene. For this
to happen, it is first necessary to create a full-view panorama from the acquired images. In this section,
four representative conventional 360º panoramas creation solutions from the literature will be reviewed.
These solutions were selected considering their technical approach in order to make this review more
conceptually varied and thus useful for the reader. While the first solution considers a direct (pixel-
based) registration approach, the remaining solutions consider feature-based registration.
2.3.1. Solution 1: Panoramic Image Creation Combining Patch-based Global and
Local Alignment Techniques
This section will review the solution proposed by Shum and Szeliski in [23]. This solution proposes a
framework for full-view panoramic image creation where patch-based global and local alignment
techniques are combined to improve the quality of the created panorama. In the context of the previously
proposed architecture (see Section 2.1), this solution corresponds to a 360º panorama creation solution
based on a direct (pixel-based) registration technique.
A. Objectives and Technical Approach
The main objective of this solution is to enable the creation of high quality full-view panoramas from
images taken with handheld cameras by combining patch-based global and local alignment techniques.
To achieve this goal, this solution uses a rotational panorama representation, where each input
image is associated with a rotation matrix (and optionally a focal length), instead of an explicit projection
of all input images into a common composing surface. After a pairwise alignment of all images according
to the respective motions models, a global alignment technique is applied over the images collection to
reduce possible accumulated registration errors. A local alignment technique is then applied at the block
level, for each image, to further compensate for local misregistration (due to motion models inadequacy
or camera models estimation inaccuracy). By combining global and local alignment techniques, the
quality of the final panorama is significantly improved.
13
B. Architecture and Main Tools
Figure 7 depicts the architecture of the panoramic image creation solution reviewed in this section based
on the combination of both global and local alignment techniques [23].
Figure 7 – Architecture of the panoramic image creation solution combining pixel-based global and local
alignment techniques [23].
In the following, a short walkthrough is presented with the most interesting tools deserving more
detail:
1. 8-Parameter Perspective Mosaics – Firstly, if the camera intrinsic parameters are unknown,
an initial estimate for the transformation associated with each input image is obtained by
performing motion estimation between each input image and a warped version of the mosaic
(panoramic image) resulting from the previous images’ pairwise registration; in this case, an 8-
parameter perspective transformation (i.e. homography) is used in the warping process.
2. Estimate focal length – Based on the homography initial estimate associated with each pair of
image (computed in step 1), a rough estimate of the lens focal length is computed.
3. Rotational Mosaics – Once the images’ focal lengths are known, and assuming that the
camera is rotating around its optical center, one rotation matrix is estimated for each input image
considering that the mapping (transformation) between two images is described by a 3-
parameter rotational model (instead of the homography model previously used). Compared to
the homography, the 3-parameter rotational model has less degrees of freedom, which allows
a faster convergence in the rotation matrix estimation process and makes it more appropriate
to the scenario where the camera is rotating around its optical center. After associating one
rotation matrix with each input image, the image registration process can be performed in the
input image’s coordinate system, thus creating the rotational mosaics.
4. Patch-based Image Alignment – In this step, a patch-based image alignment algorithm is used
to align each image with a previously composited mosaic (resulting from previous images
registration) based on the rotational motion models computed in step 3. For this purpose, each
image is divided into a number of blocks or patches Pj (e.g. 8x8 samples blocks) and for each
14
patch center (belonging textured areas) its corresponding point is searched in an overlapping
image. By sequentially applying this algorithm to each input image, an initial panorama is
assembled.
5. Block Adjustment - In this step, a global alignment (block adjustment) technique is applied to
the whole set of images, adjusting each image’s transformation (i.e. rotation and focal length)
to minimize the accumulated registration errors (resulting from the previous step); this results
into an optimally (in the least square sense) registered mosaic. The pairwise alignment
performed in the previous step may not be optimal since it assumes that all pixels contained in
a given patch share the same behavior, thus a hierarchical or pyramid motion estimation
technique is adopted [7]. The adopted global alignment technique is based on establishing point
correspondences between images that have overlapping areas (from the patch-based
alignment step). After dividing each image into a number of patches (e.g. 16x16 samples), patch
center correspondences are found by inter-frame transformation.
6. Deghosting - After performing the global alignment (previous step), there may still be localized
registration errors present in the image mosaic, due to effects that were not taken into account
in the adopted camera model, such as camera translation and radial distortion, among others.
To compensate for these registration errors, thus making the images globally consistent, local
(patch-based optical flow) motion estimation is performed between overlapping images’ pairs.
The resulting (motion) displacements are then used to wrap the respective input image to
reduce localized registrations errors (ghosting) that might have subsisted to the global alignment
step. At the end of this step, the final panorama is available.
7. Panoramic Image Mosaics - The final panorama (created in the previous step) is stored as a
collection of images with associated geometrical (rotational) transformations.
8. Environmental Maps - In this post-processing step, the final panorama is converted (mapped)
into an arbitrary texture-mapped polyhedron surrounding the origin, called environmental map,
with the goal of exploring the virtual environment. The shape of the environmental map, i.e. the
panorama type or panorama projection, is a decision left up to the user.
C. Performance and Limitations
All the experiments reported [23] were performed using a rotational panorama representation with
unknown focal length. The overlapped area percentage between two neighboring images was around
50%. For the patch-based and global alignment steps, a patch size of 16, an alignment accuracy of 0.04
pixel and 3 levels in the pyramid were considered.
Figure 8 illustrates a panorama created from 6 images, acquired with a leveled and tilted up camera,
before (Figure 8(a)) and after (Figure 8(c)) applying the global alignment technique described in Section
3.2.3.1. B (step 5); Figure 8(c) and Figure 8(d) correspond to a close-up of Figure 8(a) and Figure 8(b),
respectively. In Figure 8 , the images do not cover the entire horizontal FOV in the 3D sphere of vision
(only 6 images were used).
15
(a) (b)
(c) (d)
Figure 8 – Mitigating misregistration errors by applying global alignment: (a) image mosaics with visible gaps/overlaps; (b) corresponding image mosaics after applying the global adjustment technique; (c) and (d)
close-ups of left middle regions of (a) and (b), respectively [23].
In Figure 8(c), which shows a close-up of double images on the middle left side of Figure 8(a), it is
possible to see a misalignment which is no longer noticeable in Figure 8(b) and Figure 8(d), after
applying the global alignment technique.
Figure 9(a) shows a panorama created from two images acquired with a handheld camera where
some camera translation occurs. The local misregistration resulting from the motion parallax introduced
by the camera translation is visible in Figure 9(a), notably through the double image (i.e. ghosting effect)
of the stop sign. This effect is significantly reduced using the local alignment (deghosting) technique
(described in Section 3.2.3.1. B – step 6), as shown in Figure 9(b). Nevertheless, some artifacts are still
visible due to the fact that this technique is patch-based (with a patch size of 32 in this example) instead
of pixel-based. To overcome this problem, the local alignment technique is repeatedly applied with
successively smaller patch sizes. Figure 9(c) shows the panorama image in Figure 9(a) after applying
the local alignment technique three times, with patch sizes of 32, 16 and 8.
(a) (b)
16
(c)
Figure 9 – Mitigating the effect of motion parallax by applying local alignment: (a) image mosaic with parallax; (b) image mosaic after applying a single deghosting step (patch size of 32); (c) image mosaic after applying
three times deghosting steps (patch sizes of 32, 16 and 8) [23].
As it is possible to observe in Figure 9(c), this local alignment iterative process has the advantage of
being able of refine the local alignment and handling large motion parallax, thus improving considerably
the quality of the panorama created.
The most important limitations of the panoramic image creation solution presented in this section
combining both global and local alignment techniques are: 1) filling a gap (between the first and last
image in the input set that occurs due to accumulated misregistration errors) or removing an overlap
present in a panoramic image only works well for a set of images with uniform motion steps (i.e. pure
panning motions) and requires that the set of images encompass the entire horizontal FOV in the 3D
sphere of vision; and 2) the global and local alignment techniques are patch-based (rather than doing
direct intensity difference minimization) to remove all visible artifacts present in the panorama.
2.3.2. Solution 2: Panoramic Image Creation using Invariant Feature based
Alignment and Multi-Band Blending
In this section the solution developed by Brown and Lowe in [24] will be reviewed. This solution proposes
an automatic panoramic image creation solution using feature based alignment and multi-band blending
techniques. Regarding the architecture previously proposed in Section 2.1, this work corresponds to a
360º panorama creation solution based on a feature-based registration technique.
A. Objectives and Technical Approach
The major objective of this second solution is to allow a fully automatic panoramic image creation, where
no input information on the image collection (e.g. images order) and no initialization of the image
alignment process is required from the user. This solution addresses the full-view panorama creation
problem as a multi-image matching, allowing to recognize panoramas from a collection of input images
containing several panoramas, where invariant local features are used to establish matches between
images in the collection, and multi-band blending is used to create seamless panoramas.
In this context, this solution starts by establishing accurate matches between the set of input images
using invariant local features, which remain unchanged with varying orientation, zoom and illumination
(due to changes in exposure/aperture and flash settings) in the input images; due to the features’
invariance properties, no input information on the image collection (e.g. image ordering) is required from
the user. After that, pairwise matching is established between each input image and overlapping images
17
with a large number of features matched; each matching images set defines a panoramic sequence. A
global alignment technique (bundle adjustment) is then applied over each matching images set to reduce
possible accumulated registration errors resulting from the previous registration step. Then, an
automatic panorama straightening technique is used to correct an possible wavy effect that might be
present in the panorama, due to relative camera motion over its optical center. Global compensation is
applied afterwards to reduce the effect of different intensities in overlapping images. Lastly, a multi-band
blending technique is used to minimize the effect of false edges in images overlapping regions that
might still be visible in the panorama image (due to unmodelled effects such as vignetting or even some
unwanted motion parallax by the camera optical center), ensuring a smoother transition between images
and allowing this solution to output seamless panoramas.
B. Architecture and Main Tools
Figure 10 illustrates the architecture of the invariant feature based approach for fully automatic panoramic image creation solution reviewed in this section.
Figure 10 - Architecture of the invariant feature based automatic panoramic image creation solution.
A brief walkthrough of the architecture depicted in Figure 10 is presented in the following while
reviewing in more detail the most interesting tools:
1. Feature-based Registration: First, SIFT features [10] are detected and extracted from all input
images. Each feature location is assigned with a characteristic feature orientation and scale
identifier with the objective of selecting stable features i.e. features that remain constant under
changes of illumination, viewpoint or other viewing conditions and, therefore, can be extracted
even in images exhibiting orientation and zoom variations. The orientation and scale of each
SIFT feature is then saved in a feature descriptor vector. The feature orientation is useful when
the target image (i.e. image where a feature correspondence is looked for) is rotated with
respect to the reference image where the initial feature was extracted. The SIFT descriptor is
scale-invariant since it is computed by accumulating local gradients in orientation histograms
that are measured at the selected scale in a region/patch around each keypoint; this
characteristic provides robustness to affine changes since it enables edges to shift smoothly
without changing the local descriptor. By making use of gradients and normalizing the
descriptor, it is possible to achieve also illumination invariance. After all features have been
extracted from all input images in the collection, they are matched to its k nearest neighbors in
the descriptor space using a k-d tree method [25] (this solution considered k = 4).
2. Image Matching: For each input image, it is selected from all the overlapping images those
with the highest number of descriptor matches to the current input image, forming the set of
potentially matching images; in practice, a constant number of 6 images was considered for the
potential matching images set size. Then, the RANSAC algorithm [12] with DLT (Direct Linear
Transformation) [26] is applied to each pair of potentially candidate images, estimating the
transformation (i.e. homography) between the images. Feature/descriptor matches lying inside
Input
Images
Final 360º
Panorama
Automatic
Panorama
Straightening
Multi-band
Blending
Bundle
Adjustment
Feature-
based
Registration
Image
Matching
Gain
Compensation
18
the overlapping area that are geometrically consistent with the estimated homography form the
inlier features set while the remaining features (inside the overlapping area) that are not
geometrically consistent form the outlier features set. After, a probabilistic model is used to
verify the image matching based on the number of inliers. After establishing the pairwise
matches between images, it is possible to recognize different panoramas in the image collection
by clustering the image collection into connected sets of matching images; images that don’t
match any other image in the image collection don’t belong to a panorama and therefore are
considered noise images.
3. Bundle Adjustment - Once the set of geometrically consistent matches between the images is
found, it is necessary to perform global alignment over each cluster of matching images in order
to minimize accumulated errors resulting from the pairwise image registration in the previous
steps. To do that, a bundle adjustment technique [27] is used to estimate all the camera
parameters (i.e. rotation and focal length) simultaneously. Each image and its best matching
image (i.e. the one with the highest number of consistent matches) are added to the bundler
adjuster, one at a time; each time a new image is added, the bundler adjuster is initialized with
the same camera parameters (i.e. focal length and orientation) of the image to which it best
matches. After projecting each feature into overlapping images with corresponding features, the
camera parameters are then updated using the Levenberg-Marquardt algorithm [28] by
minimizing the sum of squared projection errors.
4. Automatic Panorama Straightening - The goal is here to remove a wavy effect that may be
present in the final panorama due to unknown 3D rotations relative to a chosen world coordinate
frame, which were not taken into account in previous registration steps (only the relative rotation
between different positions of the camera was considered). It is not very probable that the
camera performing the acquisition of all images is perfectly leveled and untilted. However, it is
reasonable to assume that twisting the camera relative to the horizon is something that people
rarely do thus minimizing the impact on this problem. In this context, the automatic panorama
straightening technique corrects the wavy effect from the panorama by applying a global rotation
such that the perpendicular vector to the plane containing both camera center and the horizon
is vertical to the projection plane.
5. Gain Compensation - The overall intensity gain between images is first computed by defining
an error function (in this case, the sum of gain normalized intensity errors) over all overlapping
samples. Once the gains are known, gain compensation is performed over all overlapping
samples thus reducing the intensity differences between overlapping images.
6. Multi-band Blending - Lastly, a multi-band blending technique [29] is used to reduce some
unwanted effects present in the final panorama (e.g. image edges that are still visible due to a
number of unmodelled effects, such as parallax effects, radial distortion, and vignetting, among
others). To solve this problem, each image, expressed in spherical coordinates, is iteratively
filtered, using a Gaussian filter with a different standard deviation value in each iteration, and a
high pass version of each image is created by subtracting the original image from its filtered
version (in each iteration); the high pass version of the image represents the spatial frequencies
19
in the range established by the Gaussian filter standard deviation value. Blending weight maps
are also created for each image in each (image filtering) iteration. The blending weight map is
initialized for each image by finding the set of samples in the previously created panoramic
image for which that image is the most responsible (for the sample values) and then is iteratively
filtered (at the same time the image filtering process takes place) using the same filter applied
to the image. The final panorama results from a weighted sum of all high pass filtered versions
of each overlapping image where the blending weights were obtained as previously described.
Therefore, low frequencies are blended over a large spatial range, while high frequencies use
a short range, thus allowing smooth transitions between images even with illumination changes
(while at the same time preserving high frequency details).
C. Performance and Limitations
In the reported experiment [24], a set of 24 input images, depicted in Figure 11(a), was used: 19 images
that match to other images in the image set (and, thus, belong to a panorama); and 5 images that do
not match to any other images in the image collection (i.e. noise images). In Figure 11(a), each arrow
connecting a pair of images illustrates that a coherent set of features matches was detected between
those image pair. As it can be noticed from Figure 11(a), this solution can effectively establish accurate
matches between overlapping images.
(a)
(b)
Figure 11 – Recognizing panorama capability: (a) image collection containing connected sets of images that will later form different panoramas and noise images; (b) 4 different blended panoramas outputted by the
panorama creation solution [24]. Figure 11(b) shows the four different panoramas resulting from the automatic panoramic image
creation solution reviewed in this section. By observing Figure 11(b), it is possible to conclude that this
solution has the capability of effectively recognizing panoramas from a collection of input images
containing several panoramas while ignoring the noise images presented in the images collection.
20
Figure 12 shows a final panorama created from 57 different input images, with a spatial resolution of
2272x1704 pixels, acquired using the camera’s automatic mode, which enables the aperture of the
camera and the exposure time to change and flash to fire when appropriate for some images’
acquisition. Figure 12(a) shows the panorama created without applying the gain compensation
technique and Figure 12(b) shows the same panorama with the gain compensation technique. The
multi-band blending technique was applied to create a seamless 360ºx100º final panorama (with a
resolution of 8908x2552 pixels) projected in spherical coordinates.
(a)
(b)
(c)
Figure 12 – Panoramas produced: (a) without applying gain compensation technique; (b) with gain compensation technique; (c) with both gain compensation and multi-band blending technique [24].
The final panorama presented in Figure 12(c) shows (by comparison with Figure 12(a) and (b)) how
the gain compensation technique effectively reduces the effect produced by large changes in brightness
present in the input images; it also shows the multi-band blending technique capability to conveniently
smooth the remaining edges (after gain compensation) that were still noticeable due to unmodelled
effects such as vignetting.
The biggest limitations of the solution previously described are: 1) creating panoramic images based
on 3D world scenes containing many moving objects or several larger objects will introduce visible
artifacts in the final panorama as the multi-band blending technique (described in Section 2.3.2. B – step
6) was not designed to accommodate that type of scenes; 2) image collections with large changes in
brightness will also cause noticeable artifacts in the panorama because high brightness changes cannot
always be corrected by gain compensation and multi-band blending (described in Section 2.3.2. B –
step 5 and 6); and 3) cameras’ radial distortion will create noticeable artifacts in the final panorama as
21
well because this effect was not taken into account in the bundle adjustment technique (described in
Section 2.3.2 B – step 3) solution reviewed in this section.
2.3.3. Solution 3: Panoramic Image Creation using Invariant Feature based
Alignment and Seamless Image Stitching
The third solution that will be reviewed was developed by Eden et al. in [30]. This work focus on the
problem of seamless high dynamic range (HDR) panorama creation from scenes containing
considerable exposure differences, large motions and other misregistrations between images in the
input set. In the context of the previously proposed architecture (see Section 2.1), this work fits in the
class of 360º panorama creation solutions based on a feature-based registration.
A. Objectives and Technical Approach
This solution has the primary objective of enabling the creation of seamless HDR panoramas from a
collection of input images taken with standard handheld cameras that may contain large scene motions,
exposure differences and misregistrations between images, e.g. due to parallax and camera calibration
errors, among others.
To reach the intended objective, this solution starts by geometrically aligning the input images,
acquired at different orientations and exposures, using a feature based registration technique similar to
the one described in Section 2.3.2 [16] [24], which is able to handle exposure differences. After the input
images are geometrically aligned, they are radiometrically aligned by mapping the images to a common
global radiance space map; this is done by computing a radiance value for each image sample relating
the camera settings with each sample’s measured intensity (inversely mapped through the camera
model used). After determining the radiance values corresponding to each image, the final panorama is
created (in the radiance space) by selecting for each sample a radiance value from one of the images
in the input collection. Two distinct steps are here involved: first, a reference panorama covering the full
angular extent of the image collection is created from a subgroup of the (geometrically and
radiometrically) aligned images using a graph-cut technique similar to that described in [31] (while
working in the radiance space); this first step enables to identify and fix the presence of moving objects
in the acquired scene. In the second step, the dynamic range of the reference panorama is extended to
the one present in the collection of input images by using cost functions favoring sample values with
higher signal-to-noise ratio (SNR) while trying to ensure smooth transitions between the images. Lastly,
a blending technique can be applied to the reference panorama, resulting in the final seamless HDR
panorama.
B. Architecture and Main Tools
Figure 13 illustrates the architecture of the seamless HDR panorama creation solution using invariant
feature based alignment and seamless image stitching, which was designed to handle large motion
scenes and exposure differences.
22
Figure 13 - Architecture of the invariant feature based seamless HDR panorama creation solution [30].
A brief walkthrough of this solution is presented in the following, giving more emphasis to the most
relevant tools (names for the modules come from the reference):
1. Capture with Varying Orientation and Exposure: In this first step, the acquisition of all images
that will be later used in the desired HDR panorama creation is performed. The input image
collection has the particularity of including images with large scene motion and considerable
exposure differences.
2. Geometrically Align: In this step, an automatic feature based alignment technique, similar to
the one described in Section 2.3.2 [16] [24], is used to geometrically align the input images. This
technique was chosen since it is capable of handling sets of input images acquired under
different exposure conditions.
3. Radiometrically Align: In this step, the set of input images already geometrically aligned (from
the previous step) undergoes a radiometric alignment process by computing a radiance value
for each image sample. This is done by relating the camera settings (shutter speed, aperture,
ISO and white balance) with the intensity measured at each sample; the camera is pre-
calibrated using a technique similar to the one in [32]. While the shutter speed, aperture and
ISO settings are extracted from EXIF (Exchangeable Image File Format) tags typically available
in digital still cameras [33], the white balance is computed from a rough estimate of the radiance.
This is achieved by selecting a reference image from the image collection with the desired color
balance and computing a per color channel gain (via least squares) so that the remaining
images match the color balance of the chosen reference image. After determining the gains,
they are applied inversely to the rough radiance estimate to normalize the color balance
differences, thus obtaining the final radiance value for each sample in each image.
4. Image Selection: In this step, the two major objectives are to detect and fix the presence of
moving objects in the scene and also to fill the entire dynamic range of the reference panorama
from the dynamic range available in all images from the input collection. This can be achieved
through two steps:
a. Firstly, a reference panorama is created using a subgroup of images from the input
collection already after the geometrical and radiometrical alignment performed in the
previous steps. The chosen subgroup of images has the particularity of including the
entire FOV present in the collection of input images, but not necessary the entire
available dynamic range. The reference panorama is created using a two-step graph
23
cut technique similar to [31], but with the particularity of working in the radiance space,
where each sample of the reference panorama has a corresponding value from one of
the images in the subgroup;
b. Lastly, the entire dynamic range available from all images contained in the input
collection is used to extend the dynamic range of the previously created reference
panorama (adding some details whenever possible). To do that, data and seam cost
functions are used to select (preferably) samples with higher signal-to-noise ratios
(samples with higher intensity have higher signal-to-noise ratios) while keeping smooth
transitions. Each selected sample is assigned with a label corresponding to the input
image identification where it comes from. An energy minimizing graph cut algorithm [34]
[35] is used to optimally select the sample (and corresponding label) minimizing both
cost functions.
5. Blend Images: In this step, an image blending technique is applied to create the HDR
panorama. In this case, the sample value for each position in the final HDR panorama is directly
copied from the corresponding input image according to the samples (and corresponding labels)
selection performed in the previous step.
6. Tonemap: Finally, a tone-mapper may be applied to the HDR panorama image if the panorama
image is to be displayed in a display with lower dynamic range; no reference to the tone-mapper
used has been provided in [30].
C. Performance and Limitations
In the reported experiment [30], a set of input images acquired with a handheld rotating camera in auto-
bracket mode (i.e. the camera takes 3 different shots under different exposure conditions) was used to
create the HDR panorama. Figure 14(a) shows the set of input images after performing geometric
registration; it should be notice that the set of input images presents a considerable amount of parallax
and also includes a person moving in the scene. Figure 14(b) center illustrate the reference panorama
after applying the first step of image selection process and Figure 14(c) center illustrates the final HDR
panorama generated after applying the second step of image selection process. Figure 14(b) center
shows the reference panorama created from the input images with the shortest exposure and Figure
14(b) right illustrates the result of applying a tone-mapper to the reference panorama; this was done
only to demonstrate the presence of noise in the darker areas of the scene, since the reference
panorama does not consider the full available dynamic range from the image collection. Figure 14(b)
and (c) left shows the reference labels identifying the input image (more specifically the pixel value)
selected to compose the desired panorama, before and after the second step in the image selection
process, respectively. Figure 14(c) shows the result of applying the second step of the image selection
process to the reference panorama; as it can be seen, this step considerably improves the quality of the
reference panorama, but there are still visible artifacts due to parallax and motion (notice that the tone-
mapper also introduced some artifacts).
24
(a)
(b)
(c)
Figure 14 - HDR panorama creation [30]: (a) Registered input images; (b) Results after applying the first step of image selection: reference labels (left), resulting panoramic image (center) and tone-mapped version of the panoramic image created (right); (c) Results after applying the second step of image selection: final reference label (left), HDR panorama (center) and tone-mapped version of the
HDR compressed panorama (right).
From Figure 14(b) center and right, it is possible to conclude that, after the first step of image
selection process, the reference panorama encompasses the entire FOV of all images (Figure 14(c)
center and right shows the same FOV) but not the full available high dynamic range of all images present
in the input set. As shown in Figure 14(b) left, only two images from the input collection (color
corresponds to an image) were necessary for the reference panorama to include the entire FOV of the
input set. From Figure 14(c) center and right, it is possible to notice that, by using a few more images
(see Figure 14(c) left), the dynamic range of the previously created reference panorama is conveniently
extended, adding more detail in many places. Figure 14(c) center and right show the final HDR
panorama without significant misalignments but the transitions between the images used to form this
panorama are not very smooth. The reason for this behavior is that, as previously mentioned, the sample
value for each position in the final HDR panorama is directly copied from the corresponding input image
according to the samples (and corresponding labels) selection process.
The most important limitations of this third solution are: 1) lack of an automatic mechanism to properly
select which images describe the location of the objects in the creation of the reference image; 2) lack
of image blending and tone-mapping techniques specifically designed for the HDR panorama creation
scenario; and 3) vignetting effect present in images acquired with low quality digital still cameras is not
addressed.
25
2.3.4. Solution 4: Panoramic Image Creation using a Locally Adaptive Alignment
Technique based on Invariant Features
The fourth solution that will be reviewed was developed by Zaragoza et al. presented in [36]. This work
focuses on improving the accuracy of multiple images’ alignment by estimating a piecewise perspective
transformation to account for input data deviations from the global perspective model, while preserving
the geometric realism of the scene in the resulting full-view panorama. In the context of the previously
proposed architecture in Section 2.1, this solution corresponds to a 360º panorama creation solution
based on a feature-based registration approach.
A. Objectives and Technical Approach
The foremost objective of this solution, denominated as As-Projective-As-Possible (APAP) method, is
to enable a natural-looking panoramic image creation (ideally without visible misalignment artifacts) from
a collection of input images that differ from each other not only by rotation but also by translation; rotation
and translation variations are typical of a scenario where images are acquired by a casual user with a
handheld camera.
To achieve this goal, this solution begins by establishing accurate matches between the input images
in the collection using invariant features. Then, pairwise matching is performed between all overlapping
images, resulting in a set of global (rigid) homographies. The estimated global homographies are then
used to map all the keypoints from all overlapping images into a reference image selected from the
images collection. This reference image is then uniformly divided into a number of n x m cells, taking
each cell center as the cell representative sample. After that, a set of sample location dependent
homographies are determined between the reference image and overlapping images (i.e. mapping each
sample cell center and the remaining samples within the same cell of the reference image to one of the
remaining overlapping images) using the proposed Moving DLT method [36]. Then, a locally weighted
bundle adjustment [36] technique is used to simultaneously refine all the previously estimated adaptive
homographies, thereby improving the alignment between all overlapping images. Finally, an image
blending technique available in the literature is used to produce the final full-view blended panorama.
B. Architecture and Main Tools
Figure 15 depicts the architecture of the panoramic image creation solution using a locally adaptive
alignment technique based on invariant features reviewed in this section.
Input
Images
Final 360º
Panorama
Image
Blending
Locally
Weighted
Bundle
Adjustment
Feature-based
Registration
Image
Matching
Figure 15 – Architecture of panoramic image creation solution using a locally adaptive alignment technique
based on invariant features.
A brief walkthrough of the architecture is presented in the following, where the most interesting tools
deserve more detail:
1. Feature-based Registration: In this step, a set of SIFT features is extracted from all input
26
images. Those features are then matched to its k-nearest neighbors using a k-d tree algorithm
[23], in a similar way to the process described in Section 2.3.2 B – Step 1.
2. Image Matching: In this step, pairwise matching between overlapping images is performed in
a similar way to the process described in Section 2.3.2 B (step 2) [24]. This matching method
applies the RANSAC algorithm [12] with DLT [26] over each potential overlapping matching pair
of images to estimate a global (rigid) homography between them. An image connection graph
is then constructed based on the estimated set of global homographies, thus relating
overlapping pairs of images, with the goal of identifying the input image with the highest number
of feature matches with the remaining overlapping images; this image is chosen as the
reference image. The set of estimated global homographies are then used to map all the
keypoints from all overlapping images into the reference image. The keypoints coordinates in
the reference image sharing the same identity (determined by the pairwise matches established)
are averaged. The outcome of this image matching step is a set of coordinates (i.e. a set of
sample locations) within the reference image, where each sample location is potentially
matched to a particular keypoint present in one of the remaining overlapping images.
3. Locally Weighted Bundle Adjustment: After determining the reference image and the set of
sample locations (within the reference image) potentially matched to keypoints in the remaining
overlapping images, the reference image is uniformly divided into n × m cells; the center sample
of each cell is taken as the cell’s representative sample. For each cell, local weights are
computed for the cell center based on the distance between each point matching (i.e. sample
location potentially matched to a keypoint in one overlapping image) and the cell’s
representative sample. After, for the representative sample of each cell in the reference image
for which a match has been found in one of the remaining overlapping images, a set of sample
location depend homographies is estimated using a proposed Moving DLT method [36]; this
method aims at estimating the projective warping (sample location dependent homography) that
better respects the local structure around the cells’ center sample. These homographies, locally
adapted to the cell’s representative sample, are then applied to every sample within the cell.
This way, it is possible to create an overall warping that flexibly adapts to the input data while
attempting to preserve the projective trend of the warp (homography); in fact, these adaptive
homographies reduce smoothly to a global homography as the camera translation becomes
zero. Then, bundle adjustment is used to simultaneously refine the set of sample location
dependent homographies by minimizing the transfer error of all established correspondences.
The transfer error is a weighted sum over all correspondences of the distance between each
keypoint in the overlapping images and the projective warp (obtained using the locally adapted
homographies previously estimated) of the potentially matched keypoint in the reference image.
4. Image Blending: Finally, the sample values for each position of the final panorama are obtained
by applying a blending method over all overlapping images, which minimizes seam effects that
may remain visible after the previous steps. In this work [36], this module has been implemented
using image blending techniques available in the literature (e.g. pixel intensity averaging,
feathering blending [6] and seam cutting [31]). For the results presented in Figure 16, the
27
overlapping images are simply blended using sample intensity averaging.
C. Performance and Limitations
The results reported in this section [36] were obtained using 7 images of 2000x1329 pixels each,
corresponding to views differing in terms of rotations and translations [36]. The reference image was
partitioned into 100x100 cells and the number of keypoints matched within this image was 13380. The
aligned images were projected into a cylindrical surface and image blending was performed using simple
pixel intensity averaging. Figure 16 illustrates the resulting panoramas obtained with the APAP solution
and the solution reviewed in Section 2.3.2 (denominated here as Autostitch). The red circles in both
panoramic images highlight alignment errors.
Figure 16 – Panoramas created with the Autostich and APAP solutions [36].
As it can be observed in Figure 16, the panorama created with the Autostitch solution presents
considerable misalignment artifacts. This is due to the fact that this solution uses global homographies
(applied to all samples of each pair of overlapping images) to align the images, which are estimated
assuming that the overlapping images correspond to views purely differ by a rotation. Conversely, the
APAP solution, based on location dependent homographies estimation (via Moving DLT) and locally
weighted bundle adjustment, is able to handle in a more accurate way the input images deviations from
the aforementioned assumption; when compared to the Autostitch solution, the APAP approach leads
to much less misalignment artifacts in the scene static areas. However, both solutions cannot handle
large objects moving through the scene (such as people movement) without the use of pixel blending
methods.
The most important limitation of this fourth solution is that it is not able to properly handle large
moving objects in the scene (which inevitably produce visible artifacts in the final panoramic image),
without the use of advanced pixel blending methods.
28
Chapter 3
3. Light Field based 360º Panoramas Creation
In this chapter, the basic concepts as well as some first methodologies and tools involved in the creation
of 360º panoramas using a light field imaging representation will be addressed. To achieve this goal,
this chapter begins by presenting the basic concepts regarding light fields imaging representation,
proceeding after to the brief review of two representative solutions regarding light field based 360º
panoramas creation. Contrary to the previous chapter where conventional imaging 360º panoramas
creation was addressed, this chapter addresses 360º panoramas creation in the context of a new
imaging representation paradigm, light fields.
3.1. Basic Concepts
Since the beginning of photography, photographers have the inevitable desire of changing and editing
their pictures after they have been acquired. Conventional pictures are largely defined at the acquisition
moment, since parameters like the focus and the viewpoint are fixed at that time and cannot be change
after, thus creating several limitations associated to the traditional concept of photography. After the
acquisition moment, some visual information is irreversibly lost since conventional cameras only capture
in their sensors the total sum of light rays that strikes the same position in the camera lens, rather than
capturing the amount of light carried by each single light ray that contributes to the acquired image.
Whether they are analog of digital, conventional cameras only capture a two-dimensional representation
of the 3D world scene using the two available dimensions (the x and y axis) of the camera film/sensor.
This is clearly a limited imaging representation of the real visual scene.
Nowadays, with the recent emergence of new sensors and cameras capturing higher dimensional
representations of the visual world, the conventional imaging representation model corresponding to a
collection of rectangular samples for some wavelength ranges (i.e. 2D trichromatic image) is being
challenged. These tremendous developments, e.g. the Lytro [1] and Raytrix [2] cameras, are associated
with the need to provide the user with a more immersive, powerful and faithful visual experience of the
word around.
29
To better understand the impact of these innovative developments, it is useful to revisit the
fundamentals of the human vision process in a demand to discover and design new ways of representing
the visual world information [37]. In this context, it assumes particular relevance the so-called plenoptic
function, which represents all the visual information in the world by considering the intensity of light that
can be seen from any 3D spatial position (viewpoint at x,y,z coordinates), from every possible angle
(angular viewing direction 𝜃, 𝛷), for every wavelength, λ, and at every time, t. This complete function
provides a powerful representation framework of the world visual information using 7 degrees of
freedom, P(x,y,z,𝜃, 𝛷, 𝜆,t) [38], as shown in Figure 17. The intensity of the light transported in a light ray
is denoted as radiance, i.e. the magnitude of each light ray.
Figure 17 – Illustrating the plenoptic function [37].
Despite the plenoptic function being able to provide a complete, powerful and conceptually simple
7D representation of the visual world, its high dimensionality implies a tremendous amount of data.
Thus, for practical and realistic applications, it is necessary to conveniently sample it, perhaps reducing
its dimensionality in an appropriate way. Moreover, the adopted plenoptic function sampling will have to
be performed while avoiding the occurrence of aliasing. There are some useful assumptions to simplify
the sampling of the plenoptic function, notably: 1) considering that the light rays travel in a 3D space
without regions of occlusion (i.e. free space), the radiance carried in a light ray prevails unchanged along
its route through empty space, reducing one spatial dimension [39]; 2) fixing the time in the capturing
moment; and 3) considering only the wavelength of the electromagnetic waves in the visible spectrum
[37].
Considering the situation where the three previous sampling assumptions can be applied (i.e. a static
4D sampling of the plenoptic function), the notion of a 4D light field emerges corresponding to the
amount of light flowing through each point in space (x,y), for all directions (𝜃, 𝛷) [39], i.e. formally the
magnitude of each light ray (radiance) traveling in the empty space.
As mentioned above, the recent development of new sensors and cameras aims to provide a more
immersive and powerful visual experience to the user. The most notorious emerging developments in
this area are the so-called light field cameras which embrace this innovative imaging representation
approach; examples of the recent Lytro [1] and Raytrix [2] light field cameras are shown in Figure 18(a)
and (b), respectively. These new cameras allow to conveniently sample the plenoptic function as a 4D
light field, since they have a micro-lens (i.e. lenslet) array on the optical path, inserted between the
digital sensor and the main lens, as illustrated in Figure 18(c). Each lenslet captures a slightly different
30
perspective of the scene by considering directional information, meaning that the radiance for each
wavelength range is captured for each position and for each angle. Every sub-image (i.e. each
perspective view captured by each lenslet) differs a little bit from its neighbor sub-image, since the
incoming light rays are deflected slightly differently according to the corresponding lenslet position in the
array, as shown in Figure 18(d).
(a) (b)
(c) (d)
Figure 18 – Light field cameras and imaging acquisition system: (a) Lytro Illum camera [1]; (b) Raytrix camera [2]; (c) imaging acquisition system [39]; (d) micro images formed behind the micro-lens array.
Since these new cameras capture a higher dimensional representation of the visual information in a
scene, the resulting data allows new interesting functionalities compared to conventional photography,
notably: 1) to define what parts of the image should be in focus or not (i.e. interactive focus shifting)
after the image have been acquired (i.e. using post-processing); 2) to use larger apertures (i.e. the
opening of the lens's diaphragm through which light passes), facilitating photographs taken in low-light
environments without using a flash ; 2) to change slightly the viewpoint for the visualization process [39];
3) to perform realistic recoloring and relighting of the acquired scene by using the abundant visual
information captured; and 4) to generate 3D stereo images from a single light field camera by using the
captured information associated to the scene depth [37].
While using this innovative imaging representation paradigm corresponds to richer images produced
using light field cameras, these images cannot be directly and immediately displayed in a traditional
displays which are ready to display the conventional 2D arrays of radiance. Thus, to visualize the scene
in some chosen way, it is first necessary to computationally extract from the 4D light field representation
a specific 2D image corresponding to a selected viewpoint. To perform this task, it is necessary to use
the so-called computational imaging methods that are able to render 2D images from the captured 4D
light fields [39]. However, to fully benefit at the display time of the 4D light field representation, new light
field displays are already being developed which are able to replicate in front of the viewer the originally
captured directional light field.
31
Naturally, if 360º panoramas were until now created with multiple conventional 2D images, it is
expected to see in the future 360º panoramas created using multiple 4D light field images. Since based
on richer elementary images, it is expected that these new panoramas may provide the user with
additional functionalities. While this research field is still at its infancy, some first solutions already exist
in the literature and will be reviewed in the following.
3.2. Reviewing the Main Light Field based 360 Panorama Creation Solutions
This section will review two representative light field based 360º panorama creation solutions from the
literature. The solutions that will be reviewed were selected considering their technical approach in order
to make this review more thorough and valuable for the reader. The first reviewed solution follows an
approach similar to the traditional 360º panorama creation solutions (reviewed earlier in Section 2.3).
The second reviewed solution addresses light field based 360º panorama creation in a more innovative
way since it does not require performing depth reconstruction or extraction and matching of image
features as it happened in the first light field based solution reviewed.
3.2.1. Solution 1: Light Field based 360º Panorama Creation using Invariant
Features based Alignment
This section will review the solution developed by Lu et al. in [40]. This solution is designed to enable
light field based full-view panorama image creation using a feature based alignment technique to register
the set of input images. This solution corresponds to a light field based 360º panorama creation solution
using a feature-based registration approach.
A. Objectives and Technical Approach
The main objective of this solution is to enable light field based 360º panoramic image creation from
images acquired by the Lytro light field camera [1] using a feature-based image registration approach.
To accomplish this objective, this solution begins with image acquisition of the 3D world scene using
the Lytro light field camera [1]. Next, the depth information of the acquired scene is extracted by creating
a light field image stack per input collection image, where each image within the stack is focused at a
different depth. Then, each light field image stack is flattened into an all-in-focus image by copying, for
each sample position of the all-in-focus image, the (co-located) sample data from the most focused
image within each stack. After this, all the all-in-focus images (associated to the light field image stacks)
are converted to grayscale and features are extracted from each of them. Once the extracted the
features from all grayscale all-in-focus images, feature matching is performed between a given all-in-
focus image (reference image) and each one of the remaining all-in-focus images, estimating the
corresponding transformation between them. Using the estimated transformation, the non-reference
grayscale all-in-focus images are translated and warped to the reference image coordinate system.
Then, image blending and stitching technique are applied to all grayscale all-in-focus images (aligned
in the same coordinate system), generating the final light field panorama. Lastly, using a viewer tool
32
created by the authors of this solution, it is possible to enjoy an interactive view over the created light
field based panorama, giving to the user the possibility of performing zoom-in’s and zoom-out’s at any
specific location and to enjoy an interactive view over the created light field based panorama.
B. Architecture and Main Tools
Figure 19 depicts the architecture of light field based panoramic image creation solution using invariant
features based alignment.
Figure 19 – Architecture of the creation and interactive consumption of the light field based panoramic image
creation using invariant features based alignment.
A succinct walkthrough of the architecture depicted in Figure 18 is presented in the following, giving
more detail to the most relevant technical tools used:
1. Light Field Acquisition: Firstly, images are acquired using the Lytro light field camera [1];
besides recording the location and intensity of light as a regular digital still camera does, the
Lytro camera also records the direction of the light rays. This camera produces pictures
originally in the RAW format.
2. 3D Data Extraction: In this step, the depth information of the scene photographed is extracted.
For that purpose, a stack of jpeg images, each one focused at a different depth, is created from
each image in the input collection (in the RAW format). This is done by using the software
provided by Lytro [1]. For each input image (in RAW format), this software generates the desired
light field image stack as a number of different images with a spatial resolution of 1080 x 1080
pixels each and associates a depth value to each image, corresponding to the depth of the focus
plane; each image within the stack defines a layer. The number of images (layers) in the stack
depends on how many different depth planes are necessary to display the RAW image under a
full range of focuses.
3. Feature Detection and Extraction: This step aims at detecting and extracting distinctive
features from each image. To achieve this goal, each light field image stack is first flattened into
an all-in-focus image. This is done by copying, for each sample position of the all-in-focus image,
the (co-located) sample data from the most focused layer; the most focused layer (image within
the stack) is determined by comparing the contrast at each sample position between layers.
This process allows easier and more accurate feature detection because it reduces the blurring
effect associated to image areas that are poorly focused. Each all-in-focus image (associated
to a light field image stack) is then converted to grayscale and SURF features [11] are extracted
from it. SURF descriptor characterize how pixel intensities are dispersed within a scale
dependent neighborhood of each keypoint detected using Fast Hessian and offer scale and
rotation invariant properties (similarly to the SIFT descriptors).
4. Image Alignment: After extracting features from all grayscale all-in-focus images, image
matching is performed. For that purpose, a reference image is first selected. In this case, the
3D
World
Scene
Image
Blending
and
Stitching
Light Field
Acquisition
Image
Alignment
Feature
Detection
and
Extraction
Rendered
ViewViewerViewer
3D Data
Extraction
Final 360º
Panorama
33
first all-in-focus image is considered as the reference image, which means that the remaining
all-in-focus images will be aligned relative to it. Feature matching is then performed between
the reference (all-in-focus) image and the remaining all-in-focus images. After that, the RANSAC
algorithm [12] is applied to each pair of all-in-focus images in order to estimate the
transformation (homography) between them. As it has been mentioned in Section 2.1 and 2.3.2
B – Step 2, feature matches that are geometrically consistent with the estimated homography
are considered inlier features while remaining features are considered outlier features and, for
that reason, are discarded. The estimated homographies are then used to translate and warp
each image to the reference image coordinate system. Once the images are correctly aligned,
a JSON (JavaScript Object Notation) file is created in order to store the correspondences
founded, including the location of the corners where the images align and the matching vertices.
5. Image Blending and Stitching: In this step, image blending and stitching techniques are
applied to the all-in-focus images previously aligned (step 4) to create the light field panorama.
Unfortunately, no details have been provided in [40] on the blending and stitching techniques
used.
6. Viewer: In this step, the user has the possibility to enjoy an interactive view of the final created
light field based panorama using an appropriated viewer tool. The viewer was created by the
authors and is an extension to the work developed by Behnam [41]. The user has the possibility
to interact with the created panorama, and thus enjoy an interactive experience, using the
mouse; it is possible to rotate, zoom-in and zoom-out the created panorama, etc. The zoom-ins
and zoom-outs are achieved by using the light field image stacks, which contain images focused
at different depths.
C. Performance and Limitations
Figure 20 shows a panoramic image created using 3 input images, acquired with the Lytro light field
camera, differing from each other only by rotation; each input image has a spatial resolution of 1080 x
1080 pixels.
Figure 20 – Panoramic Image created showing the regions of overlap between the all-in-focus grayscale
images [40].
34
Observing Figure 20, it is possible to conclude that this solution is able of perform accurate matching
between the all-in-focus grayscale images. Furthermore, it is also possible to observe that the final light
field panoramic image created is globally consistent, with only few artifacts visible, e.g. see the visible
discontinuity in the lamp post in the overlapping region between the second and the third image.
The most significant limitations of this light field based panoramic image creation solution are: 1) not
all depth information (i.e. depth map) generated by the Lytro light field software [1] about the acquired
scene is used, although it could lead to better results in the alignment process and help reducing the
presence of visible artifacts in the final panoramic image; 2) the viewer created by the authors shows
alignment problems in the zoom out transition between the child image (i.e. the most focused image at
that specific location present in the stack) and the parent image (i.e. the all-in-focus image corresponding
to that specific location in the final panorama). This happens when the child image is not perfectly aligned
with the large panorama; 3) the image stacks created by the Lytro’s software are optimized for web use,
thereby this software tries to make the image stacks as small as possible and this may result in a stack
with only one image from which it is impossible to extract accurate depth information (if this happens,
the final panoramic image may not be focused and the zoom-ins and zoom-outs accuracy may become
compromised).
3.2.2. Solution 2: Light Field based 360º Panorama Creation using Regular Ray
Sampling
In this section, the solution developed by Birklbauer and Bimber in [42] will be reviewed. This solution
presents an innovative approach to record and compute light field based cylindrical 360º panoramas
where light rays are processed directly and, thus, it is not necessary to perform depth reconstruction or
extraction and matching of image features as it happened in the first light field based solution reviewed.
A. Objectives and Technical Approach
This solution has the primary objective of enabling light field based cylindrical 360º panoramic image
creation using all the information available from the set of input light fields acquired by the Lytro light
field camera [1] .
To achieve the primary objective, this solution starts with calibration of a Lytro camera based on the
calibration method described in [43], but with the particularity of supporting regular ray sampling. Then,
it is performed the acquisition of the scene’s light field using the Lytro camera [1] mounted on a
panoramic tripod head specially designed and constructed by the authors [42] for this purpose. Next,
the RAW data produced by the Lytro camera is decoded to a regular 4D light field based on the approach
presented in [43]. Afterward, all input light fields are registered based on a minimization of a global root
mean square (RMS) per-pixel luminance registration error related to a set of extrinsic registration
parameters. Then, this registration procedure is applied over all possible panorama projections (related
with the cylindrical parameterization adopted). Finally, a blending step is applied to the set of registered
input light rays, generating the desired final cylindrical 360º light field panorama.
35
B. Architecture and Main Tools
Figure 21 illustrates the architecture of light field based panoramic image creation solution using regular ray sampling.
Figure 21 - Architecture of the light field based panoramic image creation solution using regular ray sampling.
In the following, a short walkthrough of the architecture depicted in Figure 21 is presented while
reviewing in more detail the most relevant tools:
1. Calibration: In this step, the Lytro camera is calibrated according to the calibration method
described in [43]. This method determines the intrinsic and extrinsic parameters of a light field
camera and, furthermore, undistorts and rectifies the acquired input light fields. The outcome of
calibration and rectification (as explained in [43]) is an intrinsic transformation matrix that
transforms ray-coordinates i, j, k, l (i.e. micro-image pixel coordinates and lenslet indices) to
coordinates of ray intersection with two parameterized planes S’T’ and U’V’ (where S’T’ and
U’V’ represent the perspective and focal planes inside and outside the camera housing).
However, this intrinsic transformation matrix does not allow the desired regular ray sampling
and, thus, it had to be updated. To do that, the light field was re-parameterized by shifting the
S’T’ and U’V’ planes along their normal (i.e. camera’s optical axis) such that all rays with equal
i, j pixel coordinates focus in the same position on the shifted S’T’ (referred as ST plane) and all
rays with equal lenslet indices k, l have the same coordinates on the shifted U’V’ plane (referred
as UV plane). This led to a new intrinsic transformation matrix that allows regular ray sampling,
as desired. The intrinsic parameters rely on the desired zoom and focus settings of the camera’s
main optics and must be pre-calibrated according to those settings.
2. Light Field Acquisition: Firstly, the scene’s light field is acquired by rotating the panoramic
tripod head in which the Lytro light field camera [1] is mounted on; the exposure and ISO speed
were kept constant. The Lytro camera has a 331 x 382 hexagonal micro-lens array ahead of a
3280 x 3280 CMOS sensor. Thereby, the input light fields present a spatial resolution of 331 x
382 and an angular resolution of 11 x 11. This Lytro camera produces data originally in the RAW
format.
3. Decoding Lytro RAW Data: In this step, the RAW data produced by the Lytro camera (i.e. a
sensor image of nested hexagonally disposed micro-images) is converted into 9 x 9 perspective
images, each one having a resolution of 331 x 382, through sampling of corresponding entries
per perspective in each micro-image. The decoding the RAW data produced by the Lytro
camera to a regular 4D light field involves several steps, such as demosaicing, vignetting
correction, alignment, rectification, color correction and white balancing; while the former steps
are similar to the ones presented in [43], the latter two steps were performed by applying the
parameters extracted from metadata of the first input light field to the remaining acquired light
fields of the same panorama, to prevent visible seams after the blending step. Then, each
perspective image is upsampled by a factor of two using linear interpolation (which depends on
image gradients in order to preserve edges) of the sub-pixel-sifted neighborhood.
3D
World
Scene
Final 360º
PanoramaRegistration Blending
Decoding
Lytro RAW
Data
CalibrationLight Field
AcquisitionRe-parameterization
36
4. Registration: In this step, all the input light fields are registered. Firstly, a set of (ideal) extrinsic
registration parameters is determined based on the pre-calibrated intrinsic parameters of the
light field camera (obtained for the desired focus and zoom settings of the camera’s main optics)
and the input light fields acquired under camera rotation. The extrinsic registration parameters
(i.e. the distance of the rotated ST planes to a common rotation axis dr and the angles of rotation
αi between each successive input light field pair) are related with a chosen cylindrical light field
parameterization, which requires a multi-view circular projection in one direction and a multi-
view perspective projection in the other direction [44]. Therefore, in a cylindrical light field
parameterization, rays are expressed on a nested UV and ST cylinders instead of being
expressed in two parallel planes. The horizontal and vertical perspectives are characterized by
an angle β and a height h, respectively. Thus, each ray is parameterized by the parameters β,
h and by its intersection with the UV cylinder. For a given rotation angle between a pair of input
light fields, all rays belonging to that pair (in two-plane parameterization) are transformed with
respect to the current extrinsic parameters (dr and αi) and the rays (in cylindrical
parameterization) corresponding to the center perspective (β and h equal to zero) are selected.
The selected rays are projected from the ST plane onto the UV cylinder and are linearly
interpolated to match the sampling grid on the UV cylinder. Thus, for each pair of input light
fields, two partly overlapping cylindrically projected images (within the same cylindrical
parameter space) are computed. By minimizing the RMS per-pixel luminance error in the
overlapping image regions the optimal angle of rotation between the corresponding input light
field pair is found. This procedure is then applied successively to the remaining pairs of input
light fields and the corresponding RMS per-pixel luminance error (i.e. registration error) is
accumulated, obtaining the global registration error of the panorama for a given dr value. The
optimal dr value is the one that minimizes the global registration error.
5. Re-parameterization: In this step, the procedure adopted to register the input light fields
according to the center of perspective (β and h equal to zero) are repeated for all remaining
panorama perspectives (β and h different from zero), considering simultaneously an appropriate
sampling in horizontal and vertical angular perspective directions. This is done with the goal of
guaranteeing that the final cylindrical 360º light field panorama supports the same perspective
sampling as the input light fields and that this sampling is symmetric in both directions.
6. Blending: In this step, the set of different input light rays are blended in order to create the final
seamless cylindrical 360º panorama light field. Firstly, every necessary ray belonging to the
same input light field is linearly weighted using a vertically constant hat function. The hat function
is centered with respect to each pair of light field projections at the center of the input ray bundle.
After that, and considering each horizontal perspective, the overlapping light field projections
are alpha-blended in order to create a pleasant panorama perspective. At the end of this step,
the final cylindrical 360º light field panorama is available.
C. Performance and Limitations
In the reported experiments [42], the light field acquisition was performed using a Lytro light field camera
37
[1] (in everyday mode, i.e. default refocus range, with 1.5x zoom) mounted on a panoramic tripod head.
Figure 22 depicts the resulting light field panoramas obtained with the proposed solution (first, second
and third rows) and with the solution reviewed in Section 2.3.2 (fourth row), denominated here as
AutoStitch; in the latter solution (AutoStitch), the panorama has been obtained through an independent
stitching procedure of the perspective images. The AutoStitch solution failed in globally registering full
360º panoramas of complex acquired scenes, thus only fractions of the set of input light fields were
registered together. The resulting light field panorama created with the solution reviewed in this section
has a resolution of 8,805 x 662 x 9 x 9. The first and second rows of Figure 22 illustrate the final light
field panorama focused on the furthest (back focus) and closest (front focus) plan of the acquired scene.
The green squares in both light field panoramas (back and front focus) highlight two specific sections
that were compared between both solutions (the solution reviewed in this section and the AutoStitch
solution) in terms of refocusing (Figure 22 third and fourth rows).
Figure 22 - Panoramas created with the AutoStitch solution and the light field based 360º panorama creation solution reviewed in this section [42].
Since the AutoStitch solution processes the spatial and directional domain of the light field
independently, the light field panorama outputted by this solution is inconsistent in the directional
domain. As it can be observed from Figure 22 fourth row, this leads to notorious artifacts, particularly in
refocusing, that are mainly due to relatively large parallax in the acquired scene. Conversely, the solution
reviewed in this section processes spatial and directional domains of the acquired light field jointly and,
therefore, provides correct results, in particularly when trying to refocusing (Figure 22 third row).
The biggest limitations of the solution reviewed in this section are: 1) this solution shares the
limitations of common light field acquisition (e.g. artifacts in the input light fields, due to under-sampling
in the spatial or direction domain, will also be noticeable in the outputted light field panorama); 2) this
38
solution demands a dense ray sampling (i.e. small parallax between light field perspectives) since it
uses linear interpolation to estimate missing light rays in the registration and re-parameterization
processes; and 3) this solution is based on some assumptions (e.g. the rotation point belongs to the
optical axis, the camera does not rotate around its optical axis, among others) which, if strongly infringed,
can lead to failures.
39
Chapter 4
4. Light Field based 360º Panorama Creation:
Architecture and Tools
In this chapter the light field based 360º panorama creation solution proposed is presented. In this
context, it starts by describing the global system architecture and walkthrough, followed by a detailed
description of the main parts, namely the light field data pre-processing module (which corresponds to
a solution already available in the literature [45]) and the key modules used to create the panorama light
field image. The creation of a light field panoramic image requires taking multiple light field images with
a suitable camera, at different camera angles. The light field panorama created should preserve the
reality of the scene as much as possible, thus preserving the directional ray information, which allows
to change a posteriori the perspective and the objects in focus.
4.1. Global System Architecture and Walkthrough
The main goal of this section is to describe the global system architecture and walkthrough of the
proposed light field based 360º panorama creation solution. Each light field image is acquired with the
Lytro Illum camera [46] and is represented as a 2D matrix of 15x15 sub-aperture images (called here
as a perspective image stack). The idea followed here is to create a light-field panorama from a set of
2D panoramas obtained by stitching all the perspective image stacks (light field images). This method,
which is named here as multi-perspective image stitching, stiches together a set of perspective images
at the same location of the image stack, using classical 2D panorama creation techniques. Figure 23
illustrates 3 different light field images (as perspective images stacks) and the association between
perspective images of different light field captures which will be used as input for the stitching process.
The idea is to first perform stitching in the central image (yellow rectangles and arrows) located at
position (8,8) of the sub-aperture light field image (note that it is considered that the first sub-aperture
image, which is a black sub-aperture image, have an indexing of (1,1)) and, then, derive some
(registration) parameters which are used to make the stitching in the remaining perspective images (red
rectangles and arrows). This will allow to maintain the disparity between the different perspectives of
40
the light field panoramas similar to the disparity between the perspectives of each of the light field image
(before stitching). In addition, by using the same (registration) parameters obtained from the central
perspective image is possible to make this process more coherent, i.e. any stitching errors will occur in
all perspective images and the perspective image content will only change due to occlusions or new
content and not due to a different deformation or blending between adjacent perspective images. From
the stitching process it is obtained a set of 2D perspective panoramic images which are regarded as the
final panoramic light field. The perspective images that are within the matrix 4 regions (for each light
field image presented) highlighted with green and labeled with a green letter “B” are not used to create
the final light field 360º panorama because they are black or very dark images and thus not useful.
These perspectives are replaced in the final light field 360º panorama by black panoramas. All the
perspective panoramas used (i.e. the 2D perspective panoramas and the black panoramas) have the
same resolution.
Figure 23 – Illustration of the stitching process of light field images (represented as perspective images stacks).
The proposed multi-perspective image stitching solution is inspired by the work of Brown and Lowe
in [24] previously reviewed in Section 2.3.2. This solution is able to create a light field based 360º
panorama using a feature-based registration approach. The type of projection to create the final light
field 360º panorama is the spherical projection. Figure 24 depicts the global system architecture of the
proposed light field based 360º panorama creation solution.
Figure 24 – Global system architecture of the proposed light field based 360º panorama creation solution.
In the following, a brief walkthrough of the proposed solution depicted in Figure 24 is presented:
1. Light Field Acquisition: In this step, the light field of a visual scene is acquired from different
perspectives using a Lytro Illum light field camera [46], a Nodal Ninja 4 panoramic tripod head
[47] and a Manfrotto 190CXPRO 4 tripod [48]. The Lytro Illum camera is mounted on the
panoramic tripod head which rotates around a central rotation point, with a constant rotation
angle between each acquisition, to perform the acquisition of all parts (in the horizontal plane)
of the visual scene. Due to the acquisition procedure, the light field panorama obtained may
have a FOV of 360º in the horizontal direction and approximately 62º of FOV in the vertical plane
(corresponds to the vertical FOV of the Lytro Illum camera since no vertical rotation is performed
in this acquisition step). Regarding the Lytro Illum camera, the light rays are collected by a
CMOS sensor (with 7728 × 5368 samples) containing an array of pixel sensors organized in a
Light Field Image 2Light Field Image 1 Light Field Image 3
B
B
B
B
B
B
B
B
B
B
B
B
Light Field
Acquisition
Light Field Data
Pre-Processing
Central
Perspective
Images
Registration
Light Field 360º
Panorama
Creation
360º Light
Field
Panorama
3D
World
Scene
Composition
41
Bayer-pattern filter mosaic as illustrated in Figure 25(a); this sensor produces GRBG RAW
samples with 10 bit/sample. Naturally, a lenslet array on the optical (illustrated in Figure 25(b))
path allows to capture the different light directions. The Lytro camera stores the acquired
information in the so-called .LFR files; this is a container format that stores various types of data,
notably the Raw Bayer pattern GRBG image, associated metadata, a thumbnail in PNG format
and system settings, among others. The remaining acquisition conditions are described in
Section 5.1.
(a) (b)
Figure 25 - Lytro Illum light field camera: (a) GRBG Bayer-pattern filter mosaic [49]; and (b) imaging acquisition system [50].
2. Light Field Data Pre-Processing: In this step, the RAW light field data produced by the Lytro
Illum camera is pre-processed to obtain a 4D light field, i.e. a 4D dimensional array with two
ray directions and two spatial indices of pixel RGB data. In this case, several operations are
performed, namely demosaicing, devignetting, transforming and slicing, and finally color
correction (rectification light field processing is not performed). The pre-processing applied over
the RAW light field data is done using the Light Field Toolbox developed by D. Dansereau [51],
which is explained in detail in Section 4.2. After, all the 2D perspective images (of the 4D light
field) are stored, i.e. it was extracted from the 4D Light Field (LF) array, 193 perspective images,
which are stored in the bitmap format (the images obtained from one light field image
correspond to a perspective image stack). Also, a directory file which lists all extracted
perspective images obtained from the set of light field images that compose the final panorama
was written. This file indicates the order (i.e. sequential order, thus the images need to be
processed according to their position in the final light field panorama) of the input light field
images and associated perspective image stacks which will be used in the creation of the final
light field 360º panorama; note that the order needs to be sequential. All perspective images
have a resolution of 623 x 434 pixels. Ideally, the resolution should be 625 x 434 pixels but due
to the presence of black pixels in the first and last columns of each perspective image it was
necessary to remove the first and the last column of each perspective image.
3. Central Perspective Images Registration: In this step, the central perspective images located
at position (8,8) of the sub-aperture light field image (one for each different perspective image
stack created in the previous step) are registered. The goal of this step is to obtain a set of
registration parameters that will be used to perform the composition (next step) of all perspective
42
images of each different light field. The main processes involved in this step are feature
detection and extraction, image matching, an initial (rough) pairwise camera parameters
estimation (intrinsic and extrinsic parameters), global camera parameters refinement, wave
correction and final perspective panorama scale estimation. This processing module is
described in detail in Section 4.3. The outcome of this process is the registration parameters,
which are the camera intrinsic and extrinsic parameters.
4. Composition: In this step, all corresponding perspective images in each different perspective
image stack will be composed to produce the required perspective panoramic images to create
the final light field panorama. Thus, the goal of this step is to compose all 2D panoramas, this
means one 2D panorama for each different perspective of the 4D light field. The composition
process of these 2D panoramas will use the central perspective images registration parameters
previously estimated (camera parameters). The main processes involved in this step are image
warping (where is applied the spherical projection), exposure compensation, seam detection
and blending. The first perspective panorama created is the central perspective panorama since
the creation of the remaining panoramas use information in the blending process (image warped
masks and corners) obtained from the composition of the central perspective. The perspective
images that correspond to the corners of the 2D array of light field images that are too dark (as
previously described) will be replaced by black panoramic images with the same resolution of
the created 2D panoramas. The outcome of this process will be a set of perspective 2D
panoramas (193 perspective panoramic images and 32 black panoramas), all with the same
resolution.
5. Light Field 360º Panorama Creation: In this step, all perspective 2D panoramic images are
rearranged into a 4D light field in the same way as the input light field is represented after the
pre-processing module previously described in Step 2. This means that by storing the final light
field 360º in this 4D format it is possible to perform some rendering, e.g. extract a single
perspective panoramic image or refocus a posteriori in a specific depth plane of the acquired
visual scene in the same way as a usual light field image.
4.2. Light Field Toolbox Processing Description
This section intends to describe the internal processing of the Light Field Toolbox developed by D.
Dansereau [51] which is largely used in this Thesis. Figure 26 illustrates the architecture of the light field
images processing flow in the Light Field Toolbox (LFT) software [45]. The dashed modules represent
some processing steps that needs to be performed just one time (offline) to obtain parameters that are
relevant for all the remaining input light field images processing.
43
Figure 26 - Light Field Toolbox software: light field images processing architecture.
In the following, the walkthrough of the architecture in Figure 26 is presented, notably by highlighting
for each module its main objectives as well as inputs and outputs:
1. Lenslet Grid Model/Structure Estimation: In this step, a collection of white light field images
is used to create a set of lenslet grid models/structures; each Lytro Illum camera has internally
stored its own collection of white light field images, which is also available in the external
memory card of the camera. A white image is an image captured using a real-world setup that
requires a white diffuser (i.e. translucent material used to soften the hard light produced for
example by a strobe light). The acquisition of the white light field images is performed by the
manufacturer (just one time) when each camera is produced. For each white image in the
camera card memory, a lenslet grid model/structure (which is hexagonally packed in the case
of Lytro Illum camera, as shown in Figure 27(a) is generated and stored as a *.grid.json file. An
example of a white image also displaying its predicted lenslet centers (illustrated as red dots) is
shown in Figure 27(b). The zoom and focus settings of each white light field image in the camera
card memory is stored in a file called WhiteFileDatabase.mat; this file will be later used to select
the proper white light field image to perform the transforming and slicing process, but also the
devignetting operation as will be further explained. This step is executed using the
LFUtilProcessWhiteImages command of the LFT software and is explained in [51].
(a) (b)
Figure 27 - Hexagonal micro-lens array: close up [52]; and (b) example of a white image and associated estimated lenslet centers represented as red dots [51].
2. Camera Calibration Parameters Estimation: In this step, a set of camera calibration
parameters is estimated (performing in advance to the processing of each of the light field
images); this is performed using a collection of calibration grids/checkerboard light field images
(previously acquired), each one acquired with a different pose. The camera calibration
parameters will be mainly used to compensate the angular difference between the
Rectified
Light Field
Image
Rectified
Light Field
Image
Demosaicing DevignettingTransforming
and Slicing
White Light
Field Images
WhiteFileData
base.mat
Color
Correction
RGB Lenslet
Light Field
(LF) Image
Devignetted
Lenslet LF
Image
Aligned LF
Image
Color
Corrected LF
Image
Lenslet Grid
Model/Structure
Estimation
Checkerboard
Light Field
Images
CalibrationDatabase.matCamera Calibration
Parameters
Estimation
Input Light
Field Image
(.LFR file)
Input Light
Field Image
(.LFR file)Rectification
44
corresponding rays of adjacent pixels (within a micro-lens) and the radial distortion due to the
shape of the micro-lenses. To estimate the camera calibration parameters from the
checkerboard images it is necessary to perform: i) feature detection: corners are located in the
checkerboard light field images, as illustrated in Figure 28; ii) initialization: pose and intrinsic
parameters for each image are initialized; iii) optimization: pose and intrinsic parameters without
lens distortion (minimizing an RMSE) are coarsely estimated. Then, a second optimization
considering lens distortions is performed; and finally iv) refinement: camera intrinsic and poses
estimated parameters are refined. The outcome of this step is a set of camera calibration
parameters, notably plenoptic camera intrinsic and micro-lens radial distortion parameters
(stored in a file called CalibrationDatabase.mat). The rectification process will use the estimated
calibration parameters to attenuate the radial distortion effect which is present due to the
characteristics of the lenselet array as well as all the main optical lenses. This procedure is
explained in detail in [51]. This process is executed using the LFUtilCalLensletCam and
LFUtilProcessCalibrations commands of the LFT software.
(a) (b)
Figure 28 - Example of: (a) calibration pre-processed light field checkerboard image; (b) checkerboard corners identification [51].
3. Demosaicing: In this step, a conventional linear demosaicing technique is applied over the
acquired raw image. The demosaicing process is responsible to produce a full color RGB lenslet
light field image from the non-overlapping Bayer pattern color samples of the Lytro Illum camera.
This process may produce undesirable effects for some pixels, notably those closer to the micro-
lens edges due to the reduction of light intensity; for that reason, the edge pixels may have to
be ignored. This process is described in [43] and is executed using the LFUtilDecodeLytroFolder
command of the LFT software. An example of an image before and after demosaicing is
illustrated in Figure 29.
45
(a) (b)
Figure 29 - Example of an image: (a) before and (b) after demosaicing [53].
4. Devignetting: In this step, the vignetting effect in the RGB lenslet light field image is corrected.
The vignetting effect is the darkening of the pixels near the border of the micro-lens. Thus, from
the white images database that was previously created (as described in Step 2), the appropriate
white image for each acquired light field is selected; the RGB lenslet image is then divided by
the chosen white image to compensate the lower intensity close to the micro-image edges [43].
The outcome of this step is a devignetted light field this means a light field where the vignetting
effect is (almost) not present. This process is described in [43] and is executed using the
LFUtilDecodeLytroFolder command of the LFT software. An example of a demosaiced raw
lenslet image without vignetting effect correction is illustrated in Figure 30.
Figure 30 - Example of a demosaiced raw lenslet image before devignetting [43].
5. Transforming and Slicing: In this step, the devignetted RGB lenslet light field image is aligned
and sliced to a square grid of micro-images. Originally, the accurate placement of the micro-
images array in the camera’s optical path is unknown and each lenslet element is spaced from
their neighbors by a non-integer multiple of the pixel/sample pitch (i.e. the distance from the
center of one pixel to the center of the next pixel). Thus, the outcome of this step is a lenslet
light field image organized into a square grid of lenslets and thus also micro-images. Each
micro-image has a resolution corresponding to the number of directions for which light intensity
is measured (the angular resolution). For Lytro Illum, the lenslet grid includes 625 x 434
elements and each lenslet element captures the light rays coming from 15 x 15 different
46
directions. After this step, each light field image is stored as a 4D matrix called LF(i,j,k,l,c) which
contains all sub-aperture images (i.e. perspective images), which is called here as perspective
image stacks. This process is also explained in [43] and is executed using the
LFUtilDecodeLytroFolder command of the LFT software.
6. Color Correction: In this step, the previously obtained 4D light field image organized as grid of
micro-images is color corrected. The main operations involved in the LFT module are gamma
correction, RGB color correction and color balancing (i.e. a global adjustment of the intensity of
the colors). This is achieved by using some light field metadata, e.g. the basic RGB color and
the gamma correction parameters. The outcome of this step is a color corrected lenslet light
field image and this process is executed using the LFUtilDecodeLytroFolder command of the
LFT software while including the ColourCorrect task.
7. Rectification: In this final step, each color corrected light field image is rectified with the goal
of significantly reducing the radial distortion of the micro-lens (mainly because micro-lens Fly’s
Eye shape). The light field rectification process relies on the camera calibration parameters (i.e.
plenoptic camera intrinsic model and micro-lens radial distortion parameters) previously
estimated in Step 3, as described before. Thus, the outcome of this step is a rectified lenslet
light field image corresponding to a rectangular grid of lenslet images aka micro-images. This
last step is executed using the LFUtilDecodeLytroFolder command of the LFT software by
including the Rectify argument to the function and it is explained in [43].
The final output light field image is stored as a 4D matrix called LF(i,j,k,l,c) (which is available after
the transforming and slicing processing module) with size (15,15,434,625,4). This matrix has the
following data indexing: (i,j) corresponds to the coordinates of each pixel within each micro-image, i.e.
when the first two indices i and j are fixed it corresponds to a 2D perspective image; (k,l) corresponds
to the spatial coordinates of the lenslet element in the array; (c) corresponds to the color component,
notably the 3 RGB components, but can also include a weight channel representing the confidence
associated to each pixel intensity value.
4.3. Main Tools: Detailed Description
This section describes in detail the main tools of the proposed light field based 360º panorama creation
solution. In the implementation of the proposed solution, it was used the OpenCV library [54] where
some of the processing modules described in the following are implemented. The main tools are the
central perspective images registration and composition processes of the global system architecture
presented in Section 4.1.
4.3.1. Central Perspective Images Registration Processing Architecture
The central perspective images registration process architecture of the proposed light field based 360º
panorama creation solution is shown in Figure 31. The main goal of the central perspective images
registration is to compute a set of registration parameters from all central perspective images of the
different perspective image stacks (obtained from the several 4D LF images covering different areas of
the visual scene). This central perspective images registration parameters will be used to compose all
47
different perspective panoramas in the composing process described in detail in Section 4.3.2. Figure
31 depicts the central perspective images registration process architecture.
Figure 31 – Central perspective images registration architecture of the proposed light field based 360º
panorama creation solution.
A walkthrough of the registration process architecture illustrated in Figure 31 is presented, where the
main tools deserve more detail:
1. Feature Detection and Extraction: In this step, local features [11] are detected and extracted
from all central perspective images (one for each perspective image stack) using the SURF
feature detector and extractor [10]. The SURF detector is a blob detector which is based on
the Hessian matrix to find points of interest. The SURF descriptors characterize how pixel
intensities are distributed within a neighborhood of each detected point of interest (keypoint).
SURF descriptors are robust to rotation, scale and perspective changes in a similar way to the
SIFT descriptors. Figure 32 shows the features detected from 2 overlapping central perspective
images (no feature scale or orientation is shown to allow the visualization of the content and
keypoint descriptor location).
Figure 32 – Features detected and extracted from 2 overlapping central perspective images.
2. Sequential Image Matching: In this step, the set of features detected and extracted (from all
central perspective images of each perspective image stack) in the previous step is pairwise
matched according to the order presented by the directory file (created in Section 4.1 - Step 2).
This order reflects the position of each acquired light field image in the final light field 360º
panorama. The feature matcher does: 1) for a given feature in one image it is identified the two
best descriptors in the other image and thus two matches are obtained; 2) then, the two
corresponding distances are obtained which express how similar the two descriptors involved
in a match are, 3) the ratio between the two distances (for the two matches) is computed and
the best match is preserved only if this ratio is larger than a given threshold. This process is
repeated for every feature detected in one of the images. After, the RANSAC algorithm [12] with
DLT [29] is applied to each pair of central perspective images, estimating the transformation
Central Perspective Images Registration
Feature
Detection
and
Extraction
Sequential
Image
Matching
Rough
Camera
Parameters
Estimation
Global Camera
Parameters
Refinement
Central
Perspective
Images
Central
Perspective
Images Final
Perspective
Panorama Scale
Estimation
Central
Perspective
Images
Registration
Parameters
Central
Perspective
Images
Registration
Parameters
Wave
Correction
48
model (i.e. homography) between them. After estimating the homography between each pair of
overlapping central perspective images, the features that are coherent with the estimated
transformation model are classified as inliers and the remaining ones are classified as outliers
and filtered (removed). Figure 33 illustrated the image matching between the 2 overlapping
central perspective images after applying the RANSAC algorithm (i.e. inlier matches). Again,
the scale or orientation of the descriptors are not shown.
Figure 33 – Image Matching after applying RANSAC algorithm (inlier matches).
3. Rough Camera Parameters Estimation: In this step, the camera intrinsic (focal length) and
extrinsic parameters (camera rotation) are roughly estimated. For each pair of overlapping
central perspective images, the camera intrinsic (focal length) and extrinsic (rotation)
parameters are estimated from the corresponding homography under the assumption that the
camera undergoes a pure rotation to capture different areas of the visual scene. All
transformations (i.e. homographies) used to estimate the camera intrinsic and extrinsic
parameters are generated from the previously estimated sequential pairwise matches (Step 2).
Then, the median value from all estimated focal length values (one for each pair of overlapping
central perspective images) will be considered the focal length value to be used in the next step.
Camera translation is assumed to be zero during the whole light field 360º panorama creation
pipeline.
4. Global Camera Parameters Refinement: In this step, the camera intrinsic (focal length) and
extrinsic parameters (rotation) roughly estimated in the previous step are globally refined with a
global alignment procedure over each pair of matching images thus reducing accumulated
registration errors resulting from the sequential pairwise image registration. This is achieved
using a bundle adjustment technique [27] which simultaneously refines the camera intrinsic
(focal length) and extrinsic (camera rotation) parameters. The bundle adjustment technique only
considers the overlapping pair of images that have a confidence value (expresses the reliability
of estimated homography for each pair) above a given threshold. In this case, the bundle
adjustment technique used minimizes the sum of the distances between the rays passing
through the camera center and the SURF features and the matches estimated in Step 2. The
Levenberg-Marquardt algorithm [28] is used to update the camera parameters by minimizing
the sum of squared projection errors associated to the projections of each feature into
overlapping images with corresponding features.
49
5. Wave Correction: In this step, a panorama straightening technique is used with the goal of
reducing the wavy effect that may occur in each final 2D perspective panoramic images. This
technique is able to straighten the final panorama by correcting the camera extrinsic parameters
(e.g. rotation) to keep the ground level. This effect is due to unknown motion of the camera
rotation central point relative to a chosen world coordinates frame since it is rather hard to
maintain the camera rotation central point perfectly static and stable during the camera
acquisition of all light field images that compose the final panorama. Since only camera
horizontal rotations are considered during the whole light field 360º panorama creation pipeline,
the unknown motion of the camera rotation central point is not considered in previous
registration steps. Camera parameters are updated according to a global rotation which is
applied such that the vector normal to the horizontal plane containing both the horizon and
camera centers is vertical to the projection plane. Figure 34 illustrated the result of applying the
panorama straightening technique described over perspective panoramic image.
(a)
(b)
Figure 34 – Wave correction examples: (a) without and (b) with applying the panorama straightening technique. Both examples presented are the final panorama that was obtained after all composition steps.
6. Final Perspective Panorama Scale Estimation: In this step, the perspective panoramic image
scale of all 2D perspective panoramas is estimated according to a specific focal length value.
This is done by sorting in ascending order all the focal length values previously refined (i.e.
updated in the global camera parameters refinement step) and selecting the middle value of
this set. This module is performed in parallel with the previous step since the focal length values
will not be changed and are already available after Step 4. This value will be used later in image
warping process of all perspective panoramic images.
4.3.2. Composition Processing Architecture
This section describes in detail the composition process architecture of the proposed light field based
360º panorama creation solution. Figure 35 depicts the composition process architecture. The
composition process aims to create all perspective panoramic 2D images by using the previously
estimated registration parameters of the central perspective image panorama. The registration
parameters required by the composition module are the camera (intrinsic and extrinsic) parameters. The
first perspective panorama created is always the central perspective panorama. The dashed modules
50
and arrows represent a processing step (seam detection) that is only performed to the central
perspective images. The creation of the remaining panoramas requires the use of some information of
the image warping and seam detection processes of the central perspective panorama, namely top-left
corners (relating the position of each image in the final light field 360º panorama) and image warped
masks, respectively. The orange arrow coming from the blending process to the image warping process
symbolize the iteration loop over all perspective images of different perspective images stacks, i.e. the
proposed solution iterates over all perspective stacks to create a set of perspective panoramas.
Figure 35 – Composition architecture of the proposed light field based 360º panorama creation solution.
In the following, the walkthrough of the composition process architecture illustrated in Figure 35 is
presented while describing in detail the main tools:
1. Image Warping: In this step, image warping is performed using all perspective image stacks
and the central perspective images registration parameters (i.e. camera intrinsic and rotation
parameters) previously estimated in the registration process. The goal of this process is to apply
a deformation of all input images according to the selected projection and to obtain a set of top-
left corners that will be used in the blending process later described. Thus, all perspective
images are projected/warped using a spherical rotation warper according to the final perspective
panorama scale value and the camera parameters (i.e. intrinsic parameters and rotation)
previously estimated in the central perspective images registration process. Besides the warped
images, the output of this step is also a collection of top-left corners (one corner for each warped
image). The top-left corners obtained from the image warping of the central perspective image
are used later in the blending process of all remaining perspectives of the final light field 360º
panorama. All warped images will be used later in the exposure compensation and blending
processes. Figure 36 illustrates a central perspective image before Figure 36(a) and after
(Figure 36(b)) suffering the described image warping process (to help visualizing the difference
between the two images, it is advised to look to the at the left border of Figure 36(b)).
Composition
Final
Perspective
Panoramas
Final
Perspective
Panoramas
Perspective
Images
Stacks
Perspective
Images
Stacks
Image
Warping
Central
Perspective
Images
Registration
Parameters
Central
Perspective
Images
Registration
Parameters
Exposure Compensation Blending
Central
Perspective
Images Mask
Central
Perspective
Images Mask
Central
Perspective
Images
Central
Perspective
Images
Seam Detection
Loop to Iterate Over All
Perspective Images
Loop to Iterate Over All
Perspective Images
51
(a) (b)
Figure 36 - Image warping example: (a) before (b) after applying image warping in a central perspective image.
2. Exposure Compensation: In this step, an exposure compensation technique [55] is used with
the goal of attenuate the intensity differences between the warped images that compose each
final 2D perspective panorama. The technique used tries to remove exposure differences
between overlapping perspective images by adjusting image block intensities. By dividing each
warped image in blocks and making use of the overlapping and non-overlapping information for
each pixel, soft transitions are achieved within a perspective panorama containing various
overlapping regions and also between different overlapping perspective images.
3. Seam Detection: In this step, a graph-cut seam detection technique [56] is used with the goal
of estimating seams, i.e. lines which define how the overlap areas in warped images will
contribute in the creation of the final perspective panoramic image. With this goal in mind, image
masks and seams are estimated jointly with the goal of finding the optimal seams between
overlapping central perspective images (note that this step is only perform for the central
perspective images, which is the reason why this module is dashed). The graph-cut seam
detection technique determines the optimal position of each seam between all warped central
perspective images, enabling the composition process of all perspective panoramas. This
technique creates the images masks which defines the seam to compose all images of the
panorama using the top-left corners obtained from the image warping process previously
described. Figure 37 illustrate an image mask resulting of the seam detection process over all
central perspective images. The white region defines the position of the central perspective
image corresponding to the image mask presented and the lines that separate the white region
from the black one are detected seams.
Figure 37 – Image mask example.
52
4. Blending: In this step, a multi-band blending technique [29] is applied to the region where
images are overlapping. The goal of this technique is to attenuate some undesired effects that
may exist in each final perspective panorama, such as visible seams due to exposure
differences, blurring due to misregistration, ghosting due to objects moving in the scene, radial
distortion, vignetting, parallax effects, among others. A detailed description of this blending
technique is available in Section 2.3.2 B – Step 6. This step uses the image masks obtained in
the seam detection step (which is only performed to the set of central perspective images) and
top-left corners associated to the central perspective images and the warped perspective
images needed to create a perspective 2D panorama.
After finish the blending process over some perspective panoramic image, the proposed solution
starts the composition of the next perspective panorama (i.e. it comes back to Step 1 of the composition
process) that corresponds to the neighboring perspective of the right side. The final outcome of this
process are a set of perspective 2D panoramic images, which all together and rearranged in a 4D array
can be understood as a light field panorama.
53
Chapter 5
5. Light Field based 360º Panorama Creation:
Assessment
In this chapter, the performance of the light field based 360º panorama creation solution proposed will
be assessed. To achieve this goal, this chapter begins by introducing the test scenarios used for the
assessment of the proposed solution and the corresponding acquisition conditions, followed by the
presentation and analysis of results for a representative number of light field panoramas examples; the
analysis will consider both the multi-perspective and refocus capabilities.
5.1. Test Scenarios and Acquisition Conditions
This section intends to describe the test scenarios designed to appropriately assess the performance of
the light field based 360º panorama creation solution proposed. In addition, it intends to describe the
acquisition conditions adopted for each test scenario.
5.1.1. Test Scenarios
The design of appropriate test scenarios is critical for the good assessment of the light field based 360º
panorama creation solution. Each test scenario attempts to reproduce relevant acquisition conditions of
a common user in a given real scenario. Each scenario will enable the assessment of different
capabilities, e.g. refocus capability, perspective shift/parallax, among others, of the created light field
based 360º panoramas. Table 1 illustrates the main characteristics defining the test scenarios. notably
the position of the interesting objects (i.e. the objects that may be a posteriori refocused) and the camera
refocus range used in the acquisition. The camera refocus range refers to the range of depth planes a
priori selected by the user in the acquisition moment for which a posteriori refocusing should be possible.
In Table 1, each combination of characteristics is labelled with a letter and a number combination where
the letter corresponds to the position of the interesting objects and the number to the position of the test
scenario in Table 1. Some of the combinations were not pursued because they have no practical
relevance, i.e. they have no interest from the point of view of possible real scenarios.
54
Table 1 – Test scenarios characteristics.
Camera Refocus Range
Interesting Objects
Close to the camera Close and far away to the camera
Far way to the camera
Short Test A.1 - -
Test A.2 - -
Large - Test B.3 Test C.3
In the following, the selected test scenarios defined in Table 1 are briefly discussed to highlight their
added value for the performance assessment to be made later:
A. Interesting objects close to the camera and short camera refocus range: This test case was
designed to evaluate the performance of the proposed solution when all objects are very close to
the camera, thus with large disparity between the partly overlapping light field images used in the
panorama creation process. In this context, the following two panoramas were created:
Case A.1: Room with toys 1 – As the whole scene is within the camera refocus range, it will
be possible to refocus on any region of the visual scene and observe a large disparity between
different perspectives; also the background (the last scene’s depth plane containing objects) is
rather close to the camera. Taking into account the characteristics of this test scenario, an indoor
environment was selected. Figure 38 illustrates the central perspective/view panorama
extracted from the created light field 360º panorama for the Room with toys 1 case which
belongs to test scenario A.1.
Figure 38 – Central view for the Room with toys 1 light field 360º panorama corresponding to test scenario A.1.
Case A.2: Room with toys 2 – The interesting objects in the scene are within the camera
refocus range but the scene background is not. This case should evaluate the performance of
the proposed solution when the background is outside the refocus range; in this case, in theory,
the background will always be blurred despite the light field refocusing capabilities. The fact that
the background is blurred in each individually captured light field image can compromise the
capability of the proposed solution to conveniently extract and match features between the sub-
aperture images of the partly overlapping captured light field images. Taking into account the
characteristics of this test scenario, again an indoor environment was selected. Figure 39 shows
the central perspective panorama extracted from the light field 360º panorama for the Room
with toys 2 case which belongs to test scenario A.2.
Figure 39 – Central view for the Room with toys 2 light field 360º panorama corresponding to test scenario A.2.
55
B. Interesting objects are close and far away from the camera and large camera refocus range:
This case was designed to evaluate the proposed solution when the interesting objects are near
and far away from the camera, thus with very different disparities (from large to small). Since there
will be objects in this test that are very far away from the camera, the background is necessarily
rather far away from the camera.
Case B.3: Sea landscape and Park landscape – As the whole scene acquired is within the
camera refocus range, the entire visual scene may be refocused after the acquisition moment
and it will be possible to notice very distinct disparities (large disparities corresponding to closer
objects and small disparities corresponding to far away objects) in the created panorama.
Additionally, the scene background is within the refocus range. Figure 40 illustrates the two
central perspective panoramas extracted from two different light field 270º panoramas (Figure
40(a) and Figure 40(b) illustrate the Sea landscape and Park landscape cases, respectively)
which belong to test scenario B.3. The examples presented in Figure 40 do not encompass the
full horizontal FOV.
(a)
(b)
Figure 40 – Light field 270º panoramas corresponding to test scenario B.3: (a) Sea landscape; and (b) Park landscape.
C. Interesting objects far away from camera and large camera refocus range: This case was
designed to evaluate the performance of the proposed solution when the whole scene to be acquired
is within the refocus range but there are relevant objects very far away from the camera. In this
case, all the objects have very small disparities.
Case C.3: Empty park: As the whole scene acquired is within the camera refocus range, the
scene objects present small disparities, what may compromise the refocus capability. Figure 41
presents a central perspective panorama extracted from the light field 300º panorama for the
Empty Park 2 case which belongs to test scenario C.3. The example in Figure 41 does not cover
the full horizontal FOV.
Figure 41 – Central view for the Empty Park light field 300º panorama corresponding to test scenario C.3.
56
5.1.2. Acquisition Conditions
This section intends to describe the acquisition conditions that were used in the test scenarios described
above. In all acquisition tests, a Lytro Illum camera [46], a Nodal Ninja 4 panoramic tripod head [47] and
a Manfrotto 190CXPRO 4 tripod [48] were used. Figure 42 presents the full acquisition system used in
all test scenarios previously described.
Figure 42 – Full acquisition system used.
For each acquisition case described above, the rotation angle around the camera’s optical center
between each acquisition and the camera zoom, focus and refocus range remained constant. In the
following, the camera settings used in the acquisition of the light field images for the defined test
scenarios are described, starting by presenting the common camera settings for all test scenarios and
then presenting the specific settings for each test:
A. Common camera settings:
Zoom ring position: minimum (capturing the largest possible horizontal and vertical field of
view).
Exposure mode: Manual Mode, where the user sets manually the ISO and shutter speed.
Exposure Value (EV) compensation: this is a measured value (ideally between -1 and +1)
in each acquisition; the measured EV compensation value for each acquisition should be
very close to the measured value for the acquisition performed immediately before in order
to avoid large differences in exposure in the final light field 360º panorama.
B. Specific camera settings:
Rotation angle between acquisitions: 15º (used in test scenarios A.1, A.2 and B.3 Sea
landscape) and 30º (used in test scenario B.3 Park landscape, and C.3).
Camera refocus range:
- Test A.1: 22cm to 6m.
- Test A.2: 21cm to 1m.
- Test B.3 (Sea landscape): 22cm to ∞.
- Test B.3 (Park landscape): 30cm to ∞.
- Test C.4: 30cm to ∞.
White balance mode: Auto White Balance mode which sets the camera white balance
automatically (used in test scenarios A.1, A.2 and B.3 Park landscape example). Sunny
mode when acquiring in a sunny environment (used in test scenarios B.3 Sea landscape
and C.3).
57
All the remaining camera acquisition settings used can be found in the metadata associated to each
light field image, e.g. focal length, ISO, shutter speed, among others.
5.2. Example Results and Analysis
This section intends to present some light field panorama examples created using the light field 360º
panorama creation solution proposed. Some of the panoramas capture the full horizontal FOV and are
called light field 360º, and others only a portion of the full horizontal FOV. As previously mentioned, the
light field 360º panoramas examples presented in this section were acquired in different tests scenarios,
where each one attempts to reproduce relevant acquisition conditions of a common user in a given real
scenario. Each considered test scenario allows to assess different characteristics (refocus capability,
perspective shift, among others) of the light field based 360º panoramas created using the developed
solution.
5.2.1. Perspective Shift Capability Assessment
This section will evaluate the perspective shift capabilities associated to the created light field 360º
panorama. In the following, some created light field panoramas from the cases in Section 5.1.1 will be
used to show results.
Assessment Conditions
The light field panoramas created using the light field 360º panorama creation solution proposed are
represented (as stated in Section 4.1) by a set of 225 perspective panoramic images, where:
1) 193 are slightly different 2D perspective panoramas that result from the composition process of
corresponding perspective images of different perspective images stacks;
2) 32 black panoramas that correspond to the corners of a sub-aperture light field image (i.e. the
perspective images located at the corner of each sub-aperture light field) that are too dark to be
used in the composing process.
Each final light field panorama is organized into a 4D light field image format (i.e. organized as sub-
aperture images or perspective images) as the input light fields are expressed after suffering a pre-
processing procedure described in Section 4.1 – Step 2. This format allows to easily extract a specific
2D perspective panoramic image, which should be an important capability of the light field 360º
panorama creation.
Figure 43 illustrates a final light field panorama presented as a 2D matrix of perspective panoramic
images this means the so-called sub-aperture images. Red rectangles indicate the perspectives
selected for each one of the five light field panoramas to be used to enable a convenient assessment of
the perspective shift capability after applying the proposed solution. Note that for all the test scenarios
previously referred the same five perspectives will be used. The perspectives located at the border of
the sub-aperture image, this means at the maximum angular distance from the central perspective, have
stronger problems related to vignetting, and radial distortion, among others, as these problems
propagate from the perspective images used in their composing process (the yellow rectangle depicts
one of those perspectives).
58
Figure 43 – Light field panorama presented as a 2D matrix of perspective panoramic images.
The first perspective of each light field panorama to be presented is the central perspective (located
in position (8,8) of the 15x15 2D matrix of perspective panoramas) as this perspective is the one that
originates the central perspective images registration parameters used to compose the remaining
perspectives. Two of the four remaining perspectives are located five perspectives apart horizontally
(one to the left, located at (8,3) position, and other to the right, located at (8,13) position) from the central
perspective. The last two perspectives are located six perspectives apart vertically (one above, located
at (2,8) position, and the other bellow, located at (14,8) position) from the central perspective. The
selection of this set of perspectives should enable to perform a good analysis of the final light field
panoramas created in terms of the desired perspective shift capability. For each perspective that will
assessed later in this section, some image close-ups will be presented (highlighted with red rectangles
in each corresponding perspective) when evaluating the perspective shift capability in both the horizontal
and vertical parallax directions of the perspective panoramas (two different close-ups will be shown for
each direction). The close-ups will be used to help visualizing both perspective shifts. In each different
perspective that will be presented, the orange circles highlight some undesired effects (described in the
following) present in the acquired light field images when capturing a bright environment and the yellow
circles highlight noticeable 2D stitching artifacts. In each different close-up presented, the red vertical
and horizontal lines were drawn to help visualizing both the horizontal and vertical perspective shifts,
respectively.
Figure 44(a) shows some of the undesired effects previously mentioned for the perspectives located
at the border of the 2D matrix of perspective panoramas (or sub-aperture light field image); in this case,
the presented perspective panorama corresponds to the yellow highlighted perspective illustrated in
Figure 43, located at (1,8) position of the 2D matrix of perspective panoramas. This perspective
panorama was extracted from the light field 270º panorama created for the evaluation of the test
scenario B.3, named Sea landscape. This light field panorama was created using 18 captured light field
images, where Figure 44(b) and Figure 44(c) illustrate the corresponding first and second perspective
images (extracted from the first and second acquired light field images) used in the compositing process
of the perspective panorama presented in Figure 44(a). As it is possible to conclude by observing Figure
44(b) and Figure 44(c), the undesired effects previously mentioned are presented in all the perspective
images located at the border of each sub-aperture light field image (i.e. present in each different
perspective image stacks) used to compose the final perspective panoramic image. This fact leads to
the presence of these effects, repeatedly, across the whole perspective panorama. This problem is
present in the perspectives at the border of each sub-aperture image of all the created light field
panoramas.
59
(a)
(b) (c)
Figure 44 – Extreme left perspective panorama example (position (8,1)) with undesired effects (such as vignetting and blurring): (a) perspective panorama located at the border of the perspective panoramas; (b) first and (c) second perspective images (extracted from
the first and second acquired light field images belonging to the presented light field panorama) used to compose the presented perspective panorama.
In addition, all the perspective images located at the border of each sub-aperture image are not
sharply focused (i.e. they appear blurred), as it is possible to see in Figure 44(b) and Figure 44(c). This
leads to perspective panoramas that are not sharply focused (see Figure 44(a)) at the border of each
sub-aperture light field panoramic image created.
Panorama by Panorama Perspective Shift Assessment
Test scenario A.1: Room with Toys 1
In the following, the light field 360º panorama created for the evaluation of the test scenario A.1,
named Room with toys 1, is presented. Figure 45 depicts five different perspectives that were selected
for the assessment of the horizontal and vertical perspective shift capability with light field 360º
panorama created (according to the perspectives selected previously, see Figure 43): Figure 45(a)
presents the central perspective (8,8); Figure 45(b) the left perspective (8,3); Figure 45(c) the right
perspective (8,13); Figure 45(d) the top perspective (2,8); and Figure 45(e) the bottom perspective
(14,8). Figure 46 presents the horizontal perspective shift close-ups extracted from each perspective
panorama in Figure 45: Figure 46(a) and Figure 46(d) correspond to the two close-ups from the left
perspective (8,3); Figure 46(b) and Figure 46(e) correspond to the two close-ups from the central
perspective (8,8); lastly Figure 46(c) and Figure 46(f) corresponds to the two close-ups from the right
perspective (8,13). Figure 47 depicts the vertical perspective shift close-ups: Figure 47(a) and Figure
47(d) correspond to the two close-ups from the top perspective (2,8); Figure 47(b) and Figure 47(e)
correspond to the two close-ups from the central perspective (8,8); lastly Figure 47(c) and Figure 47(f)
corresponds to the two close-ups from the bottom perspective (14,8).
60
(a)
(b)
(c)
(d)
(e)
Figure 45 – Five perspectives extracted from the Room with toys 1 light field 360º panorama created for the test scenario A.1: (a) central perspective (8,8); (b) left perspective (8,3); (c) right perspective (8,13); (d) top perspective (2,8); and (e) bottom perspective (14,8).
61
(a) (b) (c)
(d) (e) (f)
Figure 46 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13).
62
(a) (b) (c)
(d) (e) (f)
Figure 47 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the bottom perspective (14,8).
63
The undesired effects highlighted with orange circles are due to the fact that the captured light field
images and, consequently, the sub-aperture images become overexposed when shooting a bright area
of the visual scene. The noticeable 2D stitching artifacts highlighted with yellow circles are probably due
to incorrect global alignment of all light field images and associated sub-aperture images. In addition,
the blending technique used could not correct them in an imperceptible way.
As expected, it is not very easy to recognize the horizontal and vertical perspective shifts just by
looking to the five different perspectives presented in Figure 45. As it can be observed from the
horizontal and vertical perspective shifts close-ups depicted in Figure 46 and Figure 47, respectively,
the light field 360º panorama created for the evaluation of the test scenario A.1 (named as Room with
Toys 1) presents the desired perspective shift capability. From Figure 46(a), Figure 46(b) and Figure
46(c), it is possible to notice a slight horizontal perspective shift by looking to the position of the vertical
red line drawn: in Figure 46(a) the red line is located over the table post; in Figure 46(b) the red line is
drawn against the table post limit and, in Figure 46(c), the line is slightly away from the table post limit.
From Figure 46(d), Figure 46(d) and Figure 46(e), it is possible to observe the same horizontal
perspective shift by looking to the position of the vertical red line drawn relatively to the clothes hanger.
Thus, by shifting from position (8,3) of the sub-aperture image, corresponding to the close-up of Figure
46(a) and Figure 46(d), to the position (8,13), corresponding to the close-up of Figure 46(c) and Figure
46(f), it is possible to observe a slight horizontal perspective shift to the left in the location of the objects
in the acquired scene. From Figure 47(a), Figure 47(b) and Figure 47(c), it is possible to perceive a
slight vertical perspective shift through inspection of the horizontal red line location relatively to the
background scene and the Eiffel Tower toy. This slight vertical perspective shift is most noticeable in
Figure 47(d), Figure 47(e) and Figure 47(f), again by looking to the position of the vertical red line
relatively to the clothes hanger. The observed vertical perspective shift is expected since the considered
shift in perspective is done from position (2,8) of the sub-aperture image, corresponding to the close-up
of Figure 47(a) and Figure 47(d), to the position (14,8), corresponding to the close-up of Figure 47(c)
and Figure 47(f), i.e. a downward shift in perspectives of the sub-aperture image. However, the observed
perspective shifts are rather small both in the horizontal and vertical directions. This occurs because the
light field images used to create the presented light field 360º panorama are acquired using he Lytro
Illum camera [46] which have a lenslet array on the optical path, inserted between the digital sensor and
the main lens, that is rather small. Thus, the design of this light field camera does not permit to capture
great levels of disparity between different perspectives. This was a bigger limitation when acquiring in
visual scenes where the interesting objects that are relatively far away from the camera. Thus, the
created light field panoramas acquired for visual scenes where the interesting objects are very close to
the camera (which is the case of the test scenario A.1, named Room with Toys 1, and test scenario A.2,
named Room with Toys 2) will present much larger disparity between different perspectives of the sub-
aperture image.
Test scenario A.2: Room with Toys 2
In the following, the light field 360º panorama produced for the assessment of the test scenario A.2,
named Room with toys 2, is presented. As previously stated in Section 5.1, the added value of this test
64
is to evaluate the performance of the proposed solution when the background is outside the refocus
range. Thus, in theory, each light field image used to create the presented light field 360º panorama
should present a blurry background which can interfere in the extraction and matching of features
between the sub-aperture images of the partially overlapping light field images. However, the proposed
solution could conveniently create the desired light field 360º panorama. Figure 48(a) presents the
central perspective (8,8), where the orange circles highlight some noticeable camera undesired effects
(i.e. overexposure) and the yellow circles highlight noticeable 2D stitching artifacts. Figure 48(b) and
Figure 48(c) are close-ups to allow a better inspection of these problems.
(a)
(b) (c)
Figure 48 - Perspective extracted from the Room with toys 2 light field 360º panorama created for the test scenario A.2: (a) central perspective (8,8); (b) and (c) two close-ups presenting camera overexposure problems and 2D stitching artifacts.
As for test A.1 with Room of toys 1, the presence of camera acquisition undesired effects in the final
light field image (observe the highlighted orange circles in Figure 48) is due to the fact that the captured
light field images and, consequently, the sub-aperture images become overexposed when acquiring a
bright area of the visual scene. This was a considerable limitation of the camera when acquiring this
type of visual scenes, and every light field panorama created presents this type of problems. The 2D
stitching artifacts highlight with orange circles (see Figure 48(b) and Figure 48(c)) are probably due to
incorrect global alignment (i.e. global camera parameters refinement which applies bundle adjustment
technique) of all light field images and associated perspective images. In addition, the blending
technique used (i.e. multi-band blending) could not correct them in an imperceptible way, thus leading
to noticeable artifacts in the final light field panorama. By inspection of Figure 48(c), it is possible to see
that the background of the central perspective image is not blurred as it was expected. Although the
camera refocus range does not cover the background, the camera assumes that all objects in the
background scene are in a depth plane coincident with the last depth plane selected a priori by the user
(i.e. in the range of depth planes selected using the camera refocus range) at the acquisition moment
for which a posteriori refocusing should be possible. This fact enables the proposed solution to
conveniently create the desired light field panorama, when this fact could be a limitation.
Test scenario B.3: Sea landscape
In the following, the light field 270º panorama produced for the assessment of the test scenario B.3,
named Sea landscape, is presented. Figure 49 presents the corresponding five different perspectives
65
red highlighted in Figure 43. Figure 50 and Figure 51 illustrate the horizontal and vertical perspective
shift close-ups, respectively, extracted from each perspective in Figure 49.
The perspectives depicted present the same undesired effects that are originated when acquiring a
bright area of the visual scene using the Lytro Illum camera [46]. The 2D stitching artifacts highlighted
with orange circles (see Figure 49) come again from incorrect global alignment of all light field images
and associated perspective images and the inability to correct them with the used blending technique.
All these effects are replicated in the created perspective panoramas as it can be observed in Figure
49.
As it can be noticed in the horizontal and vertical perspective shift close-ups presented in Figure 50
and Figure 51, respectively, the light field 270º panorama produced for the assessment of the test
scenario B.3, named Sea landscape, presents the desired perspective shift capability. By looking to
Figure 50(a), Figure 50(b) and Figure 50(c), it is possible to see a small horizontal perspective shift by
observing, again, the red line in each close-up: in Figure 50(a), the red vertical line is over the arm of
the person at the center of the image; Figure 50(b), the red line is against the person’s arm limit; and in
Figure 50(c), the red line is slightly away from the person’s arm limit. Again, by looking to Figure 50(d),
Figure 50(e) and Figure 50(f), it is possible to observe the same horizontal perspective shift. From Figure
51(a), Figure 51(b) and Figure 51(c), it is possible to see a small vertical perspective shift by observing
the location of the horizontal red line relatively to the metal grid post in the background scene. It is
possible to see the same small vertical perspective shift by observing Figure 51(d), Figure 51(e) and
Figure 51(f), where it is noticeable the change in the position of the peninsula in the scene background
relatively to the horizontal red line. The horizontal and vertical perspective shifts present in the close-
ups in Figure 50 and Figure 51 are concordant between them and small. Also, they give the impression
that the light field 270º created for the assessment of the test scenario B.3, named Sea landscape, has
a smaller perspective shift capability compared to the previously presented test scenarios (Room with
toys 1 and Room with toys 2). This was expected since the distance from the camera to the majority of
the interesting objects in the acquired scene (the persons in the scene) is much higher compared to the
previously presented tests. Since the light field images were captured using the Lytro Illum camera [46],
which has a very limited micro-lens array due to its very small size. The amount of disparity captured is
less because the distance between the camera and the interesting objects is much higher compared to
the distance of objects in test scenarios previously presented.
Test scenario B.3: Park landscape
The light field 300º panorama created for the evaluation of the test scenario B.3, named Park
landscape, is not presented here since this case leads to the same conclusions in terms of perspective
shift capability as for the previous test case B.3, named Sea landscape.
66
(a)
(b)
(c)
(d)
(e)
Figure 49 – Five perspectives extracted from the Sea landscape light field 270º panorama created for the test scenario B.3: (a) central perspective (8,8); (b) left perspective (8,3); (c) right
perspective (8,13); (d) top perspective (2,8); and (e) bottom perspective (14,8).
67
(a) (b) (c)
(d) (e) (f)
Figure 50 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13).
68
(a) (b) (c)
(d) (e) (f)
Figure 51 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the bottom perspective (14,8).
69
Test scenario C.3: Empty park
The perspective shift capability assessment for the light field 360º panorama created for the
evaluation of the test scenario C.3, named Empty park, is reported in Appendix A.
5.2.2. Refocus Capability Assessment
This section will assess the performance of the light field 360º panorama creation solution proposed in
terms of refocusing capabilities. Similarly to what was done for the evaluation of the perspective shift
capabilities, some created light field panoramas from the cases described in Section 5.1.1 will be
analyzed.
Assessment Conditions
The refocus capability is obtained using the Light Field Toolbox software [51], developed by D.
Dansereau, notably by using the function LFFiltShiftSum. This function works by shifting all the available
sub-aperture images of each light field image to the same depth, and adding after all the sub-aperture
images together to produce a 2D depth plane extracted from the original light field. This function uses
an input value called slope, which allows controlling the optical focal plane, and thus the object, that
should be focused. For each created light field panorama presented, some different focal planes are
extracted and presented, as well as some close-ups corresponding to the presented depth planes to
help visualizing the objects in focus in each considered example. In each different focal plane presented,
the red rectangles highlight the close-ups that will be used to help visualizing the focus in specific parts
of the created light field panorama. In each different close-up presented, the red circles highlight the
interesting objects in focus.
Panorama by Panorama Refocusing Assessment
Test scenario A.1: Room with Toys 1
In the following, the light field 360º panorama created for the evaluation of the test scenario A.1,
named Room with toys 1, is presented. Figure 52 presents three different depth planes extracted from
the created light field 360º panorama and two corresponding close-ups for each depth plane selected.
Figure 52(a) was extracted with a slope = -0.05 and Figure 52(d) and Figure 52(e) are the two
corresponding close-ups; Figure 52(b) was extracted with a slope = 0.25 and Figure 52(f) and Figure
52(g) are the two close-ups; lastly, Figure 52(c) was extracted with a slope = 0.6 and Figure 52(d) and
Figure 52(e) are the two close-ups.
70
(a)
(b)
(c)
(d) (e)
71
(f) (g)
(h) (i)
Figure 52 – Three depth planes extracted from the Room with toys 1 light field 360º panorama and two corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope = - 0.05 where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.25 where (f) and (g) are the corresponding
close-ups; (c) depth plane extracted with slope = 0.6 where (h) and (i) are the corresponding close-ups.
72
As it can be observed from Figure 52, the light field 360º panorama created for the assessment of
the test scenario A.1, named Room with toys 1, presents the desired refocus capability. The images
presented in Figure 52 appear a little bit dark after applying the refocus processing over the sub-aperture
images of the light field 360º panorama created. Observing Figure 52(a) and the two corresponding
close-ups highlighting the objects in focus in the considered depth plane, which are in Figure 52(d) and
Figure 52(e), it is possible to see that the two toy cars (the blue toy car in Figure 52(d) and the grey one
in Figure 52(e)) are in focus and the remaining acquired scene is not (i.e. blurred). Moreover, by
observing Figure 52(b) and the two associated close-ups, which are in Figure 52(f) and Figure 52(g), it
is possible to recognize that the Eiffel Tower and the white toy cars (see Figure 52(f)) and the red car
and the lighthouse toys (see Figure 52(g)) are in focus. Lastly, Figure 52(c) and the two associated
close-ups, which are in Figure 52(h) and Figure 52(i), show the background scene in focus and the rest
of the acquired visual scene blurred. In summary, the light field 360º panorama created presents the
desired a posteriori refocusing capability, which is one of the most important user functionalities involved
in light field 360º panorama creation.
Test scenario A.2: Room with Toys 2
In the following, the light field 360º panorama created for the evaluation of the test scenario A.2,
named Room with toys 2, is presented. Figure 53Figure 52 presents the last depth plane extracted from
the created light field 360º panorama and two corresponding close-ups. Figure 53(a) was extracted with
a slope = 0.6 and Figure 53(d) and Figure 53(e) are the two corresponding close-ups.
(a)
(b) (c)
Figure 53 - Last depth plane extracted from the Room with toys 2 light field 360º panorama and two corresponding close-ups: (a) depth plane extracted with slope = 0.6 where (b) and (c) are the corresponding close-ups. Red rectangles highlight the close-ups that will be
used to help visualizing the focus in specific parts of the light field image.
From visual inspection of Figure 53(b) and Figure 53(c), it is possible to conclude that the background
is focused and the remaining objects present in the acquired visual scene are not focused (they are
blurred). This reinforces the fact that, despite the background not bring included in the used camera
refocus range (see Section 5.1), the background can be refocused a posteriori and still the proposed
73
solution could conveniently create the desired light field 360º panorama. This might happen because
the refocus technique considers that all objects at a larger distance that the last depth plane considered
in the camera refocus range used are located at that last depth plane. In addition, this light field 360º
panorama can refocus the objects in the scene in a similar way to the test case previously presented,
this means test A.1 with Room with toys 1. Thus, it was decided not to include here again the same
three planes examples for this light field 360º panorama as it was done for test A.1.
Test scenario B.3: Sea landscape
In the following, the light field 270º panorama created for the evaluation of the test scenario B.3,
named Sea landscape, is presented. Figure 54 presents three different depth planes extracted from the
created light field 270º panorama and one associated close-up for each depth plane. Figure 54(a) was
extracted with a slope = 0.15 and Figure 54(d) is the associated close-up; Figure 54(b) was extracted
with a slope = 0.45 where Figure 54(e) is the associated close-up; lastly, Figure 54(c) was extracted
with a slope = 0.55 and Figure 54(f) is the associated close-up.
As it can be observed from Figure 54, the light field 270º panorama created for the assessment of
the test scenario B.3, named Sea landscape, presents the desired refocus capability. By observing
Figure 54(a) and the corresponding close-up where the objects in focus are highlighted (see Figure
54(d)), it is possible to conclude that the person wearing a green t-shirt is focused and the background
scene is blurred. Moreover, by observing Figure 54(b) and the associated close-up (see Figure 54(e)),
it is possible to recognize that the person at the center of the close-up is the only interesting object in
focus. Finally, Figure 54(c) and the associated close-up (see Figure 54(f)) present the background scene
in focus despite the rest of the acquired visual not being in focus. Thus, the light field 360º panorama
created present the much desired a posteriori refocusing capability. However, the resolution of the sub-
aperture images created by using the Light Field Toolbox [45] (as described in Section 4.2) and thus,
the resolution of the final light field 360º panorama is a limitation in finding very different depth planes to
refocus the interesting objects since it is not very easy to accurate distinguish focus in different
interesting objects when these objects are beyond a certain distance from the camera. This was a bigger
limitation for the cases B.3 (both Sea landscape and Park landscape) and C.3, since the majority of the
objects are much far away from the camera than for the test cases A.1 and A.2 (named Room with toys
1 and Room with toys 2). Furthermore, as previously stated in the assessment of the perspective shift
capability, the camera used (i.e. Lytro Illum camera [46]) has a rather small and limited lenslet array that
cannot distinguish objects at different depth planes if these objects are at a considerable distance from
the camera.
(a)
74
(b)
(c)
(d) (e) (f)
Figure 54 - Three different depth planes extracted from the Sea landscape light field 270º panorama and two corresponding close-ups
for each depth plane extracted: (a) depth plane extracted with slope = 0.15 where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.45 where (f) and (g) are the corresponding close-ups; (c) depth plane extracted with slope = 0.55 where
(h) is the corresponding close-up.
Test scenario B.3: Park landscape
In the following, the light field 270º panorama created for the evaluation of the test scenario B.3,
named Park landscape, is presented. Figure 55 presents three different depth planes extracted from the
created light field 270º panorama and the corresponding close-ups. Figure 55(a) was extracted with a
slope = 0 and Figure 55(d) and Figure 55(e) are the two corresponding close-ups; Figure 55(b) was
extracted with a slope = 0.15 and Figure 55(f) and Figure 55(g) are the two close-ups; lastly, Figure
55(c) was extracted with a slope = 0.25 and Figure 55(d) and Figure 55(e) are the two close-ups.
Observing Figure 55, it may be concluded that the light field 270º panorama created for the
assessment of the test scenario B.3, named Park landscape, presents the desired refocusing capability.
By looking to Figure 55(d) and Figure 55(e), it is possible to see that the girl wearing a red top together
with the metal hand support is focused and the background scene is blurred. By visual inspection of
Figure 55(f) and Figure 55(g), it is possible to notice that the person at the center of the close-up is the
only interesting object in focus. Finally, Figure 55(h) presents the girl wearing a grey top and black jeans
in focus despite the rest of the scene not being in focus. Thus, the light field 360º panorama created
presents the desired posteriori refocusing capability.
75
(a)
(b)
(c)
(d) (e)
76
(f) (g)
(h)
Figure 55 - Three different depth planes extracted from the Park landscape light field 270º panorama and two corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope = 0 where (d) and (e) are the corresponding close-ups; (b) depth plane extracted with slope = 0.15 where (f) and (g) are the corresponding
close-ups; (c) depth plane extracted with slope = 0.25 where (h) is the corresponding close-up.
77
Test scenario C.3: Empty park
The refocus capability assessment for the light field 360º panorama created for the assessment of
the test scenario C.3, named Empty park, is reported in Appendix A.
78
Chapter 6
6. Summary and Future Work
In this chapter, a summary of the work performed in the context of this Thesis is presented, followed by
a highlight of its main conclusions. Then, some suggestions for the future work are presented.
6.1. Summary and Conclusions
The motivation of this work is the enhancement of 360º panoramic photography with additional features
such as refocusing. There are major limitations in conventional cameras are: 1) the poor imaging
representation model used, which is the conventional 2D trichromatic images; and 2) the conventional
cameras only capture the total sum of the light rays that reach the same point in the lens using only the
two available dimensions at the camera sensor instead of capturing the amount of light carried by each
single light ray. Thus, important visual information of the acquired scene is irreversible loss. These
limitations lead to the emergence of new sensors and cameras (i.e. light field cameras) adopting higher
representations of the visual information (more faithful and complete imaging representation models).
These new representations are reinventing the concepts and functionalities associated with panoramic
image creation solutions. Thus, the major objective of the Thesis is the development of a light field based
360º panorama image creation solution.
Thus, this Thesis introduce the main concepts, approaches and tools related with the development
of conventional 360º panoramic images, namely a global 360º panorama creation architecture, followed
by a description of several different types of 360º panoramas and the key conventional panorama
creation solutions available in the literature. Although there are not many light field panoramic solutions,
the first methodologies and tools associated with the creation of 360º panoramas exploiting this light
field imaging representation are described next.
The light field 360º panorama creation solution proposed in this Thesis is named as multi-perspective
image stitching and is inspired by the work developed by Brown and Lowe [24]. The main concept behind
this solution is to create light field 360º panoramas from a collection of 2D perspective panoramas. Each
different 2D perspective panorama is created by stitching all corresponding perspective images from
different perspective images stacks (i.e. light field sub-aperture images). The conventional 360
panorama creation architecture was adapted to deal with light field input and thus, for the stitching to be
79
coherent among sub-aperture images it was necessary to calculate key registration and composition
parameters only for the central view which are applied to other views of the collection of light field images
that compose the final light field panorama.
The performance assessment of the proposed multi-perspective image stitching solution is made
with relevant test scenarios proposed first time in this Thesis, which are critical for the adequate
assessment of the proposed solution. These test scenarios attempt to reproduce relevant acquisition
conditions of a common user in a given real scenario. Each created light field panorama presents a few
stitching artifacts. The experimental results obtained show both light field refocus and multi-perspective
capabilities in panoramic images created with the multi-perspective solution. In this context, it is possible
to conclude that the proposed multi-perspective image stitching solution allows the creation of light field
360º panoramas under different types of realistic scenarios. Also, both light field refocus and multi-
perspective capabilities are possible in all light field panoramas created. By observing the perspective
shift capability assessment, it is possible to conclude that: 1) the light field panoramas that were acquired
in visual scenes where the objects are close to the camera present higher perspectives shifts in both
horizontal and vertical directions. This is justified with the fact the objects that are close to the camera
present higher disparity that the objects that are far away from the camera; 2) the design of the light
field camera used (i.e. the Lytro Illum camera) does not allow to capture great levels of disparity between
different perspectives. By observing the refocus capability assessment, it can be deduced that: 1) the
light field panoramas created can be refocused on different objects present in the acquired visual scene
at the user’s choice; 2) if the objects in the acquired visual scene are very distant from the camera, they
will present very small disparities which can compromising the light field refocus capability because, in
this case, the refocus technique considers that the depth of all scene objects is the same; 3) the
resolution of the sub-aperture created using the Light Field Toolbox [45] (and thus, the resolution of the
final light field 360º panorama) is a considerable limitation when finding very different depth planes to
refocus the scene objects since it is not very easy to accurate distinguish focus in different objects if
these objects are beyond a certain distance from the camera. Also, the captured light field images and,
consequently, the sub-aperture images become overexposed when acquiring a bright area of the visual
scene. This was a considerable limitation of the light field camera used when acquiring this type of visual
scenes, and every light field panorama created presents this type of problems. Considering all results
obtained one of the major conclusions of this Thesis is that the creation of light field panoramas excels
for visual scenes containing objects close to the camera.
The light field 360º panorama creation developed is able to still maintain the desired refocus and
perspectives shift capability on the light field panoramas created. However, there are important
limitations that may be addressed to improve the proposed multi-perspective solution as explained next,
in the future work Section.
6.2. Future Work
Since the light field imaging representation is a relatively new topic there are not many panorama
creation solutions based on light field images and thus, it is expected that new and innovative light field
360º panorama creation techniques will be proposed in the future. Regarding the proposed solution,
80
some improvements are possible to improve the quality of the light field 360º panoramas created. Some
suggestions aiming to improve the developed solution are listed:
Depth-based Light Field Panorama Creation: To minimize the stitching errors and allow to
capture well the disparity to objects close to the camera and of objects that are far away it is
possible to improve the stitching process by: 1) estimate the depth of the acquired visual scene
in each light field image used and 2) use this information in the registration process by estimating
multiple homographies for regions of the image which are in different depth planes, thus, enabling
a more accurate multi-perspective stitching process [40].
Light Field Panorama Rendering: Another topic, is the development of a rendering tool
appropriate for the light field panoramas, giving the user the possibility to interact with light field
360º panorama content, e.g. using his/her mouse by rotating it in all directions or navigate through
the whole acquired visual scene, making zoom-ins and zoom-outs, etc. to enjoy a more immersive
user experience. In addition, to the usual interactions with conventional panoramas the visual
scene could be rendered with certain depth of field and allow minor perspective adjustments. This
type of rendering tool could be also relevant for the visualization of light field panoramas while
giving the user the depth impression, e.g. rendering the content in stereoscopic or virtual reality
head mounted displays.
Unrestricted Light Field Panorama Creation: Another topic that could be interesting to improve
the proposed solution is the creation of light field 360º panoramas in an unrestricted way, i.e.
moving the camera handheld and thus with some unrestricted rotation and translation camera
motion. The tripod-base scenario used here assumes that camera undergoes a pure rotation
around its no-parallax point and is very common among professional photographers. However,
there are many solutions which do not have this constraint (e.g. using smartphone cameras) and
thus it is important to also target these cases.
81
Appendix A
A. Test Scenario C.3 Named Empty Park:
Perspective Shift Capability Assessment
In this chapter, the light field 300º panorama created for the evaluation of test scenario C.3, named
Empty park, is presented.
Perspective Shift Capability Assessment
For the evaluation of this test, it is not necessary to present the five different perspective highlighted
in red in Figure 43, since it is almost impossible to recognize the difference in perspective between them.
Instead, only some close-ups for the five perspectives highlighted in red in Figure 43 are depicted. In
Section 5.1, the central perspective (8,8) of the light field panorama created for this test has been
presented. Figure 56 presents the horizontal perspective shift close-ups extracted from each
perspective panorama. Figure 57 presents the vertical perspective shift close-ups.
By observing Figure 56 and Figure 57 with the horizontal and vertical perspective shifts close-ups,
respectively, it is possible to conclude that the light field 300º panorama created for the evaluation of
test scenario C.3, named Empty park, does not present noticeable horizontal and vertical perspective
shifts. From Figure 56(a), Figure 56(b) and Figure 56(c) it is practically impossible to notice a horizontal
perspective shift by looking to the red line in each close-up. The same happens when observing Figure
56(d), Figure 56(e) and Figure 56(f). For the vertical perspective shift, the same happens as the
perspective shifts are very small and almost imperceptible (compare the vertical perspective shift for
Figure 57(a), Figure 57(b) and Figure 57(c) for the first close-up, or Figure 57(d), Figure 57(e) and Figure
57(f) for the second close-up). As previously stated, this occurs because the lenslet array inserted in the
Lytro Illum camera [46] is very rather and limited. Thus, the light field panoramas acquired in visual
scenes where the interesting objects are very far away from the camera (which is the case for this test
scenario) will present disparities that are almost unnoticeable, thus leading to very small perspective
82
differences. This fact will compromise the refocus capability as will be seen in the next section. The
farther away are the objects from the camera, less disparity will be possible to observe and less
perspective shifts (both in the horizontal and vertical directions of the sub-aperture image) will be
possible to visualize.
83
(a) (b) (c)
(d) (e) (f)
Figure 56 – Horizontal perspective shift close-ups: (a) and (d) correspond to the two close-ups from the left perspective (8,3); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the right perspective (8,13).
84
(a) (b) (c)
(d) (e) (f)
Figure 57 - Vertical perspective shift close-ups: (a) and (d) correspond to the two close-ups from the top perspective (2,8); (b) and (e) correspond to the two close-ups from the central perspective (8,8); lastly (c) and (f) correspond to the two close-ups from the bottom perspective (14,8).
85
Refocus Capability Assessment
Figure 58 presents two different depth planes extracted from the created light field 300º panorama and
the corresponding close-ups. Figure 58(a) was extracted with a slope = 0.25 and Figure 58(d) and Figure
58(e) are the two corresponding close-ups; Figure 58(b) was extracted with a slope = 0.5 and Figure
58(f) and Figure 58(g) are the two close-ups.
(a)
(b)
(c) (d)
(e) (f)
Figure 58 - Two different depth planes extracted from the Empty park light field 300º panorama and two corresponding close-ups for each depth plane extracted: (a) depth plane extracted with slope = 0.25 where (d) and (e) are the corresponding close-ups; (b) depth
plane extracted with slope = 0.5 where (f) and (g) are the corresponding close-ups.
86
As expected, by observing Figure 58, it may be concluded that the light field 300º panorama created
for the evaluation of the test scenario C.3, named Empty park, does not present the desired refocus
capability. By looking to Figure 58(c) and Figure 58(d), it is possible to see two close-ups of depth planes
extracted from the created light field image that look sharply focused and Figure 58(e) and Figure 58(f)
present two close-ups from an example depth plane that is blurred. This occurs because all the objects
in the acquired visual scene are very distant from the camera, thus the refocus technique assume that
the depth of these objects are the same, so it is not possible to extract various different depth planes
from the created light field image. As previously stated in the perspective shift capability assessment,
each acquired light field captures very small disparities, thus compromising the refocus capability of the
created light field panorama. This is related with the design of the Lytro Illum camera that was used, as
previously explained for the other test cases presented.
87
Bibliography
[1] "Lytro web page," [Online]. Available: https://www.lytro.com/. [Accessed 28 12 2015].
[2] "Raytrix web page," [Online]. Available: http://www.raytrix.de/. [Accessed 28 12 2015].
[3] E. Adel, M. Elmogy and H. Elbakry, "Image Stitching Based on Feature Extraction Techniques: A
Survey," International Journal of Computer Applications, vol. 99(6), pp. 1-8, August 2014.
[4] K. Shashank, N.SivaChaitanya, G.Manikanta, Ch.N.V.Balaji and V.V.S.Murthy, "A Survey and
Review Over Image Alignment and Stitching Methods," International Journal of Electronics &
Communication technology, vol. 5, pp. 50-52, March 2014.
[5] Z. Zhang, "A Flexible New Technique for Camera Calibration," IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 22, pp. 1330-1334, November 2000.
[6] R. Szeliski, "Image Alignment and Stitching: A Tutorial," Foundations and Trends in Computer
Vision , 2006.
[7] J. R. Bergen, P. Anandan and K. J. &. H. R. Hanna, "Hierarchical Model-Based Motion Estimation,"
in Proceedings of the Second European Conference on Computer Vision, Santa Margherita
Liguere, Italy, 1992.
[8] R. Szeliski, Computer Vision: Algorithms and Applications, Springer, 2010.
[9] J. Davis , "Mosaics of Scenes with Moving Objects," in IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’1998), Santa Barbara, June 1998.
[10] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International Journal of
Computer Vision, vol. 60, pp. 91-110, November 2004.
[11] R. Karthik, A. AnnisFathima and V. Vaidehi, "Panoramic View Creation using Invariant
Momentsand SURF Features," in IEEE International Conference on Recent Trends in Information
Technology (ICRTIT'2013), Chennai, India, July, 2013.
[12] M. A. Fischler and R. C. Bolles, "Random Sample Consensus: A paradigm for Model Fitting with
Applications to Image Analysis and Automated Cartography," Communications of the ACM, vol.
24(6), pp. 381-395, June 1981.
[13] P. J. Rousseeuw, "Least Median of Squares Regresssion," Journal of the American Statistical
Association, vol. 79, pp. 871-880, 1984.
[14] "Understanding Projecting Modes," Kolor, [Online]. Available: http://www.kolor.com/wiki-
en/action/view/Understanding_Projecting_Modes. [Accessed 15 11 2015].
[15] P. F. McLauchlan and A. Jaenicke, "Image Mosaicing using Sequential Bundle Adjustment," Image
and Vision Computing, vol. 20, pp. 751-759, August 2002.
[16] M. Brown and D. Lowe, "Recognizing Panoramas," in Ninth International Conference on Computer
Vision (ICCV’2003), Nince, France, October 2003.
88
[17] H.-Y. Shum and R. Szeliski, "Panoramic Image Mosaics," Microsoft Research , Redmond, WA,
USA, 1997.
[18] J. Brosz and F. Samavati, "Shape Defined Panoramas," in Shape Modeling International
Conference (SMI), Aix-en-Provence, France, 2010.
[19] "Panoramic Image Projections," [Online]. Available:
http://www.cambridgeincolour.com/tutorials/image-projections.htm. [Accessed 24 09 2015].
[20] "Panorama Projections," [Online]. Available: http://wiki.panotools.org/Projections. [Accessed 24 09
2015].
[21] "PTAssembler Projections," PTAssembler, February 2009. [Online]. Available:
http://www.tawbaware.com/projections.htm. [Accessed 15 11 2015].
[22] "Some Projections Created with Higin Software," [Online]. Available:
http://www.360facil.com/eng/360-degree-photo-other-projection-panorama-edition.php.
[Accessed 22 09 2015].
[23] H.-Y. Shum and R. Szeliski, "System and Experiment Paper: Construction of Panoramic Image
Mosaics with Global and Local Alignment," International Jornal of Computer Vision, vol. 36(2), pp.
101-130, February 2000.
[24] M. Brown and D. G. Lowe, "Automatic Panoramic Image Stitching using Invariant Features,"
International Journal of Computer Vision, vol. 74(1), pp. 59-73, 2007.
[25] J. S. Beis and D. G. Lowe, "Shape Indexing using Approximate Nearest-Neighbour Search in High-
Dimensional Spaces," in Proceedings of the Interational Conference on Computer Vision and
Pattern Recognition (CVPR'1997), San Juan, Puerto Rico, 1997.
[26] R. Harley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd edn, New York:
Cambridge University Press, 2004.
[27] W. M. P. H. R. a. F. A. Triggs, "Bundle Adjustment: A Modern Synthesis," in Vision Algorithms:
Theory and Practice, number 1883 in LNCS, Corfu, Greece, Springer-Verlag., 1999, pp. 298-373.
[28] R. K. S. B. Szeliski, "Recovering 3D Shape and Motion from Image Streams using Nonlinear Least
Squares.," Journal of Visual Communication and Image Representation 5, vol. 1, pp. 10-28, March,
1994.
[29] P. Burt and E. Adelson, "A Multiresolution Spline with Application to Image Mosaics," ACM
Transactions on Graphics, vol. 2(4), pp. 217-236, 1983.
[30] A. Eden, M. Uyttendaele and R. Szeliski, "Seamless Image Stitchin of Scenes with Large Motions
and Exposure Differences," in IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR'2006), New York, NY, USA, June 2006.
[31] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin and M.
Cohen, "Interactive Digital Photomontage," in ACM SIGGRAPH, Los Angeles, CA, USA, 2004.
[32] T. Mitsunaga and S. Nayar, "Radiometric Self Calibration," in IEEE Conference on Computer
Vision and Pattern Recognition (CVPR'1999), Fort Collins, CO, June, 1999.
89
[33] "Wikipedia on Exchangeable image file format," [Online]. Available:
https://en.wikipedia.org/wiki/Exchangeable_image_file_format. [Accessed 28 12 2015].
[34] Y. Boykov, O. Veksler and R. Zabih, "Fast Approximate Energy Minimization via Graph Cuts," in
IEEE Transactions on Pattern Analysis and Machine Intelligence, Kerkyra, Greek, 2001.
[35] V. Kolmogorov and R. Zahib, "What Energy Functions Can Be Minimized via Graph Cuts?,"
Transactions on Pattern Analysis and Machine Intelligence, vol. 26(2), pp. 147-159, 2004.
[36] J. Zaragoza, T.-J. Chin, Q.-H. Tran, M. S. Brown and D. Suter, "As-Projective-As-Possible Image
Stitching with Moving DLT," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
36(7), pp. 1285-1298, July 2014.
[37] F. Pereira, "Efficient Plenoptic Imaging: Why Do We Need It?," in Submitted to IEEE International
Conference on Multimedia and Expo (ICME'2016), Seattle, USA, 2016.
[38] E. H. Adelson and J. R. Bergen, The Plenoptic Function and the Elements of Early Vision, M. L. a.
J. A. Movshon, Ed., Massachusetts: The MIT Press, Cambridge, Mass., 1991, pp. 3-20.
[39] M. Levoy, "Light Fields and Computational Imaging," IEEE Computer, Vols. 38, no. 8, pp. 46-55,
2006.
[40] W. Lu, W. K. Mok and J. Neiman, "3D and Image Stitching with the Lytro Light-Field Camera,"
New York, NY, 2013.
[41] B. Esfahbod, "Behnam Esfahbod's open source projects," [Online]. Available:
http://code.behnam.es/. [Accessed 30 12 2015].
[42] C. Birklbauer and O. Bimber, "Panorama Ligh-Field Imaging," Computer Graphics Forum, vol.
33(2), p. 43–52, 2014.
[43] D. G. Dansereau, O. Pizarro and S. B. Williams, "Decoding, Calibration and Rectification for
Lenselet-Based Plenoptic Cameras," in IEEE Computer Vision and Pattern Recognition
(CVPR'2013), Portland, OR, June 2013.
[44] C. Birklbauer, S. Opelt and O. Bimber, "Rendering Gigaray Light Fields," Computer Graphics
Forum, vol. 32, pp. 469-478, 2013.
[45] D. G. Dansereau, "Light Field Toolbox v0.4 for MATLAB," [Online]. Available:
http://www.mathworks.com/matlabcentral/fileexchange/49683-light-field-toolbox-v0-4. [Accessed
March 2016].
[46] "Lytro Web Page," [Online]. Available: https://www.lytro.com/. [Accessed 28 12 2015].
[47] "Nodal Ninja Web Page," [Online]. Available: http://shop.nodalninja.com/. [Accessed 01 08 2016].
[48] "Manfrotto Web Site," [Online]. Available: https://www.manfrotto.com/. [Accessed 01 08 2016].
[49] "Bayer-Pattern Filter," [Online]. Available: https://keyassets.timeincuk.net/inspirewp/live/wp-
content/uploads/sites/13/2014/12/Bayer-filter.jpg. [Accessed 17 March 2016].
[50] A. Kondoz and T. Dagiuklas, Eds., Novel 3D Media Technologies, Springer-Verlag New York,
2015.
[51] D. G. Dansereau, "Light Field Toolbox for MATLAB," February, 2015, Thecnical Report.
90
[52] "Light Field Photography," [Online]. Available: http://tdistler.com/2010/09. [Accessed 17 March
2016 ].
[53] "Bayer Demosaicing," [Online]. Available: http://www.cambridgeincolour.com/tutorials/camera-
sensors.htm. [Accessed 01 08 2016].
[54] "OpenCV 2.4.12 Stitching API," [Online]. Available:
http://docs.opencv.org/2.4.12/modules/stitching/doc/stitching.html. [Accessed 01 03 2016].
[55] M. Uyttendaele, A. Eden and R. Szeliski, "Eliminating Ghosting and Exposure Artifacts in Image
Mosaics," in Computer Vision and Pattern Recognition, 2001. CVPR'01, Kauai, HI, USA, 2001.
[56] V. Kwatra, A. Schõdl, I. Essa, G. Turk and A. Bobick, "Graphcut Textures: Image and Video
Synthesis Using Graph Cuts," in SIGGRAPH'03, San Diego, California, USA, July 2003.
[57] "Light Field Camera System," [Online]. Available:
http://photo.stackexchange.com/questions/13378/what-are-the-basic-workings-of-the-lytro-light-
field-camera. [Accessed 01 08 2016].