Veillametrics: An extramissive approach to analyze and ... · 2.4.2 Simplified HDR method during...
Transcript of Veillametrics: An extramissive approach to analyze and ... · 2.4.2 Simplified HDR method during...
Veillametrics: An extramissive approach to analyze and visualize audio and visual
sensory flux
Sen Yang
A thesis submitted in conformity with the requirements
for the degree of Master of Applied Science
The Edward S. Rogers Sr. Department of Electrical & Computer Engineering
University of Toronto
© Copyright by Sen Yang 2018
Veillametrics: An extramissive approach to analyze and visualize audio and visual sensory flux
Sen Yang Master of Applied Science
The Edward S. Rogers Sr. Department of Electrical & Computer Engineering University of Toronto
2018
Abstract
Veillametrics is the study of sensors and their sensing capacities. This thesis describes the methodologies
employed to capture the ability-to-sense (veillance) of various arbitrary audio and visual sensors through
space, in order to visualize, quantitatively generalize, and model sensory veillance. Using the veillametry
framework, 3D models can be rendered to visualize the quantity of exposure to sensing projected onto
various environmental surfaces. The work is extended to approximate the veillance model of a human eye
using non-intrusive eye-tests, using eye gaze absement metrics. Such veillance models can extend
traditional eye tracking methods to reveal greater details of the biosensor and its range and capacity to
sense compared to simple gaze directions. The thesis relates the extramissive (emissive) nature of
veillametry to compare to intromissive theories, and gives a framework to model phenomenons such as
optical power decay, blurring, and amount of independent information captured by a sensor array.
ii
Acknowledgements
My many thanks to friends and colleagues that have made my study experience a memorable one. During
my two years of graduates studies in the Humanistic Intelligent Lab, I have had the privilege to have met
and worked with lots of encouraging and supportive colleagues and mentors. I like to thank Professor
Steve Mann for his mentorship. His philosophies and passion to mix art, science, math, and engineering
continues to inspire members of the lab to strive for artistic perfection in their own work. He showed me
that there are many interesting, holistic links between nature, art and science that should be integrated
rather than segregated.
My acknowledgements to Ryan for his scientific insights and mentorship to the various projects we
worked together, teaching me how to write, think and present scientifically. Furthermore, I would like to
thank Adnan, Johannas, Kaihua, Jacopo, and Sarang for a wonderful experience creating various High
Dynamic Range algorithms and applications together.
I would also like to thank Max, Jack, Cindy, Byron, Jackson, Francisco and Alex for the wonderful
experience working together on the Open EyeTap project. I had an enriching year programming and
designing with this brilliant and motivated team on creating various wearable devices, such as thermal
camera glasses, auto dimming glasses, radar glasses, and many others.
To Kyle and Alex, thanks for assisting and giving me advices on designing and building electronics
circuits and wearables that are used for the plotters later described in this paper.
I would finally like to thank my family and friends for their encouragement and support for me during my
studies. Their words of encouragements have helped me a long way.
iii
Contents
List of Figures vii
Chapter 1: Introduction and previous work 1 1.1 Motivation and goal 1 1.2 Previous veillametry work 2
1.2.1 The wireless television and camera phenomenon 2 1.2.2 Light painting, SWIM, and abakography as an art form 4 1.2.3 Politics of veillance: surveillance and souveillance 6 1.2.4 Previous work in veillametry 8
1.3 Veillametry applications 11 1.4 Veillance classification and thesis objective 15 1.4 Thesis organization 16
Chapter 2: Gathering veillance data 17 2.1 Methodology to measure units of veillance 17 2.2 Generic experimental setup 17
2.2.1 3-Dimensional cartesian plotter 19 2.2.2 3-Dimensional delta plotter 21 2.2.3 2-Dimensional cartesian plotter 23
2.3 Gathering photodiode veillance data 24 2.4 Gathering camera veillance data 25
2.4.1 High dynamic range (HDR) imaging techniques 25 2.4.2 Simplified HDR method during runtime 34
2.5 Gathering audio veillance data 35 2.5.1 Experimental setup 35 2.5.2 Time invariant waves - “Sitting waves” 36
2.6 Summary 37
Chapter 3: Data visualization and analysis 38 3.1 Visualizing video veillance data 38
3.1.1 Data preparations and color maps 38 3.1.2 Video camera veillance 39 3.1.3 Analyzing photodiode veillance data with optics model 40
3.2 Visualizing audio veillance data 46
iv
3.2.1 Data preparations and color maps 47 3.2.2 Audio veillance 48
3.3 Summary 50
Chapter 4: Veillons and extramission theory 51 4.1 Veillametry and the extramission model 51
4.1.1 Veillance definition 51 4.1.2 Extramission model, veillons, and vixels 51
4.2 Veillance flux density, degeneracy, and energetic optics comparisons 54 4.3 Summary 56
Chapter 5: Veillograms 57 5.1 Camera veillance field formulation 57
5.1.1 Cameras with barrel or pincushion distortions 58 5.1.1 Cameras with vignetting effects 59
5.2 Surface definition, formulation, and detection 59 5.2.1 Marker tracking using ArUco codes 60 5.2.2 Surface tracking limitations 61
5.3 3D geometry and ray tracing techniques 62 5.4 Veillance bucketing, colour mapping and 3D modelling 63 5.5 Summary 64
Chapter 6: Bioveillograms 65 6.1 Bioveillance - human eyes as veillance-rich sensors 65 6.2 Eye tests and bioveillance modelling 66
6.2.1 Model hypothesis based on human anatomy 66 6.2.2 Previous experiment setup 68 6.2.3 New experiment with application of absement 69
6.3 Eye tracker implementation 73 6.3.1 Eye tracker basics 73 6.3.2 Eye tracker hardware implementation 75 6.3.3 Eye tracker software implementation 76 6.3.4 Eye tracker calibration 78 6.3.5 Gaze estimation 80
6.4 Creating veillograms from the human eye 83 6.5 Improved equipment design using the eyetap principle 87 6.6 Summary 91
Chapter 7: Vixel distributions and blurring 92 7.1 Vixel definition, spatial resolution, and vixel overlap 93 7.2 Method to measure vixel distribution and overlap 94
v
7.2.1 Ideal vixel distribution 95 7.2.2 Experimental setup for measuring veillance distribution in single vixel 95 7.2.3 Experimental setup for measuring vixel overlap 97
7.3 Optical blurring and vixels overlap 99 7.4 Image deblurring using vixel distribution matrix 103 7.5 Upgrading veillogram renders 103 7.6 Veillametric formulations on sensory flow 104 7.7 Summary 105
Chapter 8: Conclusion 107 8.1 Contribution 107 8.2 Future work 107
References 109
vi
List of Figures Figure 1 SWIM visualization of waves transmitted by a phone P002
Figure 2 Early experimental setup for tracing out veillance field of camera P003
Figure 3 Photographs of early veillance field from a camera P004
Figure 4 Effects of exposure time in photographs P004
Figure 5 Early light painting techniques and the SWIM P005
Figure 6 Examples of veillance imbalance and veillance hypocrisy P006
Figure 7 Conceptual figure on souveillance and surveillance P007
Figure 8 Examples of veillance fields from a photodiode P008
Figure 9 Examples of coloured veillance fields from multiple photodiodes P009
Figure 10 Examples of veillance fields from infrared LEDS P010
Figure 11 SWIM visualization of radio waves and RADAR waves P010
Figure 12 Early SWIM visualization of radio waves P011
Figure 13 SWIM visualization of non-spatial dependent signals P011
Figure 14 Veillance application in mediated reality gaming P012
Figure 15 Veillance application in product design and defect detection P012
Figure 16 Veillance application in improved sensory field and attention tracking P013
Figure 17 Veillogram setup and computer generated 3D models P014
Figure 18 Conceptual figure on vixel distributions and optical blurring P015
Figure 19 Figure introducing the quantification of veillance fields P016
Figure 20 General system diagram for veillance recording prototypes P018
Figure 21 Labelled photographs of 3D cartesian plotter P019
Figure 22 Abakographics of chirplets produced by the 3D cartesian plotter P020
Figure 23 Photo and diagram of 3D delta plotter P021
Figure 24 Abakographics of chirplet produced by the 3D delta plotter P022
Figure 25 Labelled photograph of 2D cartesian plotter P023
Figure 26 A peek view of visualized quantified veillance field P024
Figure 27 A comparison of overexposed, underexposed and HDR photographs P026
Figure 28 Diagram showing the signal processings of a camera and CRT TV P027
Figure 29 Figure showing how a comparagram is produced for HDR processing P028
Figure 30 A comparagram fitted with a compressor function P029
Figure 31 Response curves and response derivatives as certainty functions P030
vii
Figure 32 HDR process for photoquantity estimation using multiple exposures P031
Figure 33 HDR composite of a multiple exposures of LED collected by photosensor P033
Figure 34 Experimental setup for wave (audio) veillametry P035
Figure 35 Measuring the speed of sound using abakographs P037
Figure 36 Data visualization colour mapping schema P038
Figure 37 Veillance field visualized for a digital camera P039
Figure 38 3D camera veillance visualized as 3D point clouds P040
Figure 39 Visualization photodiode veillance field P041
Figure 40 Fitting veillance data into an inverse square of distance model P042
Figure 41 Figure illustrating predictable error patterns from experiment P043
Figure 42 Figure marking other valid data points for model verification P043
Figure 43 Fitting validation data into fitted model P044
Figure 44 Veillance plane near parallel to optical axis of a camera visualized P045
Figure 45 Field of view of optical systems P046
Figure 46 Visualization of audio veillance field P047
Figure 47 Colour schema generator program user interface P048
Figure 48 Visualization of audio veillance with interference patterns P049
Figure 49 Audio veillance renders that are coloured with phase shifts for animation P049
Figure 50 Visualization of audio veillance with quality microphone P050
Figure 51 3D audio veillance data visualized as 3D point clouds P050
Figure 52 Figure and diagram visualizing extramissive optics concept P052
Figure 53 Conceptual figure of the photon, darkon, and the veillon P053
Figure 54 Conceptual figure visualizing spatial resolution P055
Figure 55 Comparative figure of a veillance flux and a light source P056
Figure 56 Conceptual diagram for modelling veillance field vectors of camera P057
Figure 57 Illustration of the pincushion and the barrel distortion effects P058
Figure 58 Illustration on vignetting effects P059
Figure 59 Applications of ArUco markers for modelling 3D surfaces P060
Figure 60 Examples of 3D modelling and reality augmentation using ArUco P061
Figure 61 Surface tracking limitations P061
Figure 62 Camera veillogram colour mapping schema P064
Figure 63 Examples of generated texture maps for each surface P064
Figure 64 Examples of rendered 3D veillograms on OpenGL P064
Figure 65 Distribution of rods and cones of the human eye P066
viii
Figure 66 Conceptual figures on modelling bioveillance P067
Figure 67 Diagrams on experiments to obtaining quantified bioveillance measures P068
Figure 68 Computer generated veillance flux cross-section P069
Figure 69 Introduction screen of proposed bioveillance measurement software P070
Figure 70 Examples of visual stimuli that appears in the software screen P070
Figure 71 Visualization of bioveillance based on eye displacement P072
Figure 72 Effects of dark and bright pupil illumination methods for eye tracking P073
Figure 73 Camera and IR LED positioning to obtain dark and bright pupil effects P074
Figure 74 Labelled diagram of eye tracking system prototype P075
Figure 75 Intermediate software steps for eye tracking algorithm P077
Figure 76 Eye position prediction augmented to photograph of the eye in real time P077
Figure 77 Eye tracker calibration user interface P079
Figure 78 Visualization of calibration point mass centers P080
Figure 79 Photograph of patterned board used for gaze calibration P081
Figure 80 Gaze calibration data points augmented onto photograph of the eye P082
Figure 81 Front view of the gaze tracking, surface detecting prototype P084
Figure 82 Bioveillance setup and 3D model visualizations P086
Figure 83 Conceptual diagram for using the eyeTap principle into current prototype P087
Figure 84 Example images of the eyeTap design P088
Figure 85 Labelled diagram of newer prototype using the eyeTap design P089
Figure 86 Prototype example from Open eyeTap glasses P090
Figure 87 Optical design challenges for eye tracking in eyeTap glasses P091
Figure 88 Current issues with veillogram visualizations P092
Figure 89 Conceptual figure visualizing vixels through space P093
Figure 90 Conceptual figure on vixel regions and vixel sensitivity P094
Figure 91 Experimental setup for measuring veillance power distribution P095
Figure 92 Veillance power distribution over vixel region in uncontrolled environment P096
Figure 93 Veillance power distribution over vixel region from a darkroom P097
Figure 94 Conceptual figure on vixel sensitivity overlap P098
Figure 95 Figure showing regions of veillance overlap of neighbouring pixels P098
Figure 96 Example of blurring kernel matrix and image blurring P099
Figure 97 Visualized veillance distributions for various camera blur settings P100
Figure 98 Ray tracing diagram that explains blurring using extramission concepts P101
Figure 99 Conceptual diagram for various cases of vixel overlaps P102
ix
Chapter 1: Introduction and previous work This chapter introduces the concept of veillance, which is the sensing ability of sensors through space.
This chapter explains veillance through previous works, and through motivations and applications to
veillametry. The chapter concludes with a section on thesis organization and structure.
1.1 Motivation and goal An increasing amount of electronic sensors such as cameras, microphones and wireless transceivers are
being utilized in today’s developing commercial products relating to smart wearables, mediated reality
devices, Internet of Things systems, smart city with surveillance technology, smart home appliances,
telecommunication devices, and many others. [1][2] In order for these systems to function, some level of
interaction is required between the users and the said systems, and/or from one system to another, or to its
environment, whether through audio, visual, transmitted wireless signals, and/or other means. [3][4][5]
Many of these sensors exhibit non-uniform directional sensitivity (such as the field of view for a camera,
or the range which an IR proximity sensors operate) that is also angle, space, time, and/or obstacle
dependent. All of these possible dependencies makes these sensory flux model complex and rich, rather
than simple, uniform range or binary sensitivity patterns. [6][7][8][9]
The understanding of these complex sensory boundaries and sensing capacity is important for general
product design purposes. [10][11][12] For example, in figure 1, left, wireless waves are being transmitted
by a cellular phone to be received and visualized by a SWIM (Sequential Wave Imprinting Machine) unit,
which is an array of LED with each one addressable by the input voltage of the system. [13][14][15] The
way the phone was held avoids placing flesh over the phone’s embedded antenna. Typically users with
their hands blocking the antenna attenuates the signals transmitted, as human tissues contain mostly
water. [16] Attenuation factor of various materials are compared visually on the figure to the right. [15]
This knowledge of sensory capacity and direction could help users to maximize the signal strength while
using their devices. As a product designer, this information can be used to detect defects in the sensor,
optimize component placements, to avoid having antennas on or near areas commonly obstructed by
flesh. More applications of this thesis are explored in section 1.3 - Veillametry applications.
The goal of this thesis is based on building systematic methodologies to measure the capacities of
arbitrary sensors that are widely used, such as cameras, microphones, and photodiodes, as standalone
instruments, and/or as well as when employed as a component found in a larger system. Using these
1
collected data, sensory field models and visualizations are produced. An additional goal of this thesis is to
create a framework to visualize the amount of exposure various surfaces have from sensors. A 3D model
is generated to show the amount of sensory exposure, integrated over time, of various environmental
surfaces exposed to sensory fields, as if these surfaces are like photographic film. The resultant
illustration is known as a veillogram, [17] and is created using the concept of veillametry. [18] This thesis
also expands the methodological framework to measuring electronics sensors such as microphones and
digital cameras, to include non-intrusive tests to approximate the sensory field of the human eyes. [19]
Figure 1: Left: Professor Steve Mann using the Sequential Wave Imprinting Machine (SWIM) to visualize the
strength of signal waves propagated from a smartphone. Right: The visualization of signal reception strength of
electromagnetic waves with air (top), wood (middle) and human flesh (bottom) in front of the receiver. [30]
1.2 Previous veillametry work Although quantified veillametry is a recently formulated concept and still an active area of research,
examples of related work can be traced as far as 40 to 50 years back from the time of the writing of this
thesis. This section explores some political, artistic, and scientific aspects of veillance, and give insight on
veillance and its role in human computer interaction.
1.2.1 The wireless television and camera phenomenon Mann, ever since the 1970s, was fascinated with experiments related to video feedback, and its
relationship to the sightfield and the ability-to-see of the camera. [20] One of the earliest documented
experiment Mann did relating to veillance was one where he had a video camera wirelessly connected to a
television receiver, and moved the screen across the field of view of the camera while facing it, shown as
2
figure 2, left. The experiment is done in a darkroom, with the screen able to produce at least some level of
output of light, even when it is unseen by the camera (bias). Since the television screen would always
display what the camera captures, as the screen enters the field of view of the camera, there is a positive
feedback loop effect between the sensor and the television. With the camera seeing more light, and the
camera correspondingly displaying more light, the screen soon reaches a point of brightness saturation.
An example of video feedback, creating fractal effects is shown in figure 2, right, sourced from online.
Figure 2: Left: Experimental setup overview, a television camera feeds into a receiver, while the screen is moved
around the camera. [20] Right: an example of video feedback, where the screen outputs what the camera sees in a
loop, producing fractal effects. [21, image sourced from online]
The feedback experiment produced very simple visualizations about the optical properties of that camera
used in the experiment, shown as figure 3, left. Mann also applied various colour filters on the output to
distinguish multiple experiment settings conducted with varying brightness threshold values, revealing a
rough estimate of the camera’s field of view. This idea is then extended to using an LED or bulb moving
in front of a camera system, or a single photodiode, hooked up to an amplifier circuit, with its output
connected back to the LED forming a positive feedback loop. Figure 3, right, shows one of these
examples, with a LED forming a rough pattern. It is observed that there exists a gap between what is seen
and the actual field of view, which is because the LED was moved by hand rapidly about the camera in a
long continuous S-shaped zig-zag pattern. The LED gets very dim almost instantly as it leaves the field of
view, but as it initially enters the cone again, the feedback delay makes the LED dim enough to be visible
to the camera doing the long-exposure photograph until a bit afterwards. Moving the light source slower,
or generously run the bulb across the field of view in both directions will help reduce this issue. The
photographs in figure 3 are produced using long exposure photography techniques.
3
Figure 3: Left: The abakographic result of the camera-screen experiment, with colour filters used to capture a range
of signal sensitivity. [20] Right: The same principles are applied to implement more recent abakographs, with LEDs
connected to a photodiode behind a camera lens. The photodiode is connected to an amplifier circuit and its output
to the same LED. This uses positive feedback to reveal the cone of sight of the camera. [20]
1.2.2 Light painting, SWIM, and abakography as an art form Light painting is a technique required in many previous work to visualize the accumulated visual output
over a scene, such as a lightbulb that is connected to a photodiode in a feedback loop used in experiments.
[14][22] Light painting photography requires the use of long exposure photographs to be captured. Figure
4 [23] shows the comparison of two different exposure times over the same scene. The long exposures
accumulates the light readings and reveals features like light trails.
Figure 4: [23, from online] The comparison of two pictures over the identical scene. Left: the image is taken over a
exposure time of 6 seconds, with only still objects can be identified in this image. Light exposures, such as the
headlights of the passing vehicles are accumulated (integrated) over time, forming lines of light in that picture.
Right: The image is captured on low exposure, so the image captured is more ‘instantaneous’ than it is ‘integrated’.
4
Light painting, or long exposure photography, combined with a SWIM (Sequential Wave Imprinting
Machine) can be used to visualize natural phenomena photographically. [14][15] The artistic patterns
formed in a long exposure photograph with a moving light source (SWIM) is defined as abakography,
coined by Mann. [22] The SWIM is typically designed as an array of light bulbs or LEDs (light emitting
diodes) that are each addressable by the level of an analog input signal, such as the signal strength of a
wave, voltage of a photoresistor, or audio volume received by the corresponding sensor. The SWIM can
be considered as a spatial plotter of veillance functions rather than a temporal one like an oscilloscope. In
this sense, abakography is a form of reality augmentation, allowing users to see what is otherwise
invisible. Some SWIMs are built such that each LED or bulb is linked to their individual sensors. Figure
15, left, shows Mann holding an earlier SWIM device operated by light bulbs. The middle image is an
abakographic poster, with its background completed using the SWIM as shown in the left figure, and the
right figure shows a close-up shot of a miniature version of SWIM implemented using LEDs. An entire
spatial wave function can be observed in real-time if the device is swept fast enough through persistence
of vision, which is about one thirtieth of a second. [24] During this time, optical signals are retained in the
brain, even if the light source have been moved from one location to another, leaving a light trail observed
and processed in the brain as an augmented overlay over physical space.
Figure 5: The SWIM is a useful tool for creating long-exposure abakograpgic images. Left: Mann holding to one
of his earlier SWIM devices operated by bulbs. Middle: The SWIM on the left is used to create artistic patterns on
the backgrounds of a abakographic poster for a hair design studio. Right: A more recent SWIM stick operated by
LEDs, showing the amplitude of radio signal along an axis in space, rather than in time. The visualizations can be
seen using long exposure photography.
5
1.2.3 Politics of veillance: surveillance and souveillance To the addition of science and art, there is also a political side of veillance that makes its study valuable.
During this modern era of technology and information, a lot of surveillance is present in many public
areas, and are used for statistics, such as for traffics, for stores to maintain security, deter crimes, and to
have proof or evidence footages. There are values that come from these sources of information, and hence
gives rise to another form of power: surveillance.
However, surveillance is usually prone to lack integrity. [25] If a facility refuses people to use
photographic devices within their premises while having surveillance cameras themselves (example
presented in figure 6), then there is no veillance reciprocity, and would result in a power imbalance
between the store and the shoppers. In the case where a store customer harasses or abuses someone
working in the store, there is evidence that favours the store in court, whereas when the customer is
abused by the store, the customer will not be able to present any incriminating evidence, while the store
may retain or hide their footage. This makes the customers vulnerable to abuse. [26][27][28][29][30]
Figure 6: An example of veillance imbalance, a supermarket have signs that prohibits the use cameras while they
themselves monitors and/or records shoppers. This lacks integrity because the footages are controlled by the store,
and are likely evidences that works against shoppers than helping them in a dispute. [27]
Stores also often use cameras that are covered with shaded shields, to prevent shoppers from seeing the
orientation of the camera, assuming one actually exists in that dome. The knowledge of the direction, the
ability-to-see, and the visible surfaces due to obstacles such as shelves or boxes is also a layer of power
imbalance, using similar reasonings as above. The lack of surveillance integrity is referred as surveillance
hypocrisy by Mann. [27][30]
6
Similarly, there is another form of veillance known as souveillance. If surveillance means in French
oversight, or watching someone from the above, then sousveillance, is constructed from the French word
‘sous’ for ‘under’, and with ‘veillance’ meaning ‘watching’, can be interpreted as ‘undersight’.
Souveillance can be thought of as other shoppers as witness or having photos from their cameras or
wearable devices. Souveillance data is accessible thanks to the development and adaptations of connected
wearable sensors, distributed cloud-based computing, the use of social media and other sharing platforms.
The collective watching and witnessing forms a reciprocal power balance with surveillance. [30]
There exists many interpretations to the borderline definition to help distinguish a camera or sensor as
surveillance, or souveillance, and the most intuitive would be using the spatial jurisdiction method
outlined by Ryan Janzen and Steve Mann. [18] The spatial jurisdiction rule defines that surveillance is the
information gathering from a sensor within a space owned by the user, where souveillance is the gathering
of data by a sensor owned by an user that does not own that space. Figure 7, left illustrates surveillance
and sousveillance according to the spatial jurisdiction rule, an university owns the space (a computer
laboratory) and the camera, and the camera is overseeing the lab, is an example of surveillance. Where if
a student wearing a camera recording what is going on in that lab can be considered as souveillance.
However, if the ceiling camera is aimed outside the window and overlooks the streets owned by the
government, then the activity of the camera is souveillance by nature.
Figure 7: Left: Example of surveillance and souveillance, according to the law of spatial jurisdiction. The property
owner watches over their own property is surveillance while a visitor’s recordings is considered souveillance. [18,
27, 29] Right: Souveillance domes given out during ACM’s CFP2005 Conference. [28][29]
The current work on the politics of veillance sees surveillance and souveillance as two pictures of
half-truths. [27, 29] Surveillance or souveillance itself could lack integrity, and only joined together forms
the full picture with veillence reciprocity and power balance. Equiveillance is a term coined by Mann as
7
the equilibrium between surveillance and sousveillance, a point of power balance that is to be strived for.
[28, 30] Mann’s previous work on veillance includes advocating openness of veillance, and veillance
integrity. Figure 7, right shows souveillance domes that are given out to participants of ACM’s CFP2005
Conference, a camera dome that hides the direction of veillance. This is to raise awareness to have more
veillance integrity in our society, filled with its numerous cameras everywhere.
1.2.4 Previous work in veillametry This subsection will showcase some of the earlier veillametry work. Continuing from the earlier
experiments using the video camera and the television screen from section 1.1.2 and abakography
techniques from section 1.1.1, Mann applied the same principle to LEDs and bulbs connected to feedback
amplification circuits to create a long exposure photographs indicating the field of view for these sensors.
As discussed in more detail previously, the light will amplify until it reaches saturation when it is within
range of the photosensor, creating a bright spot on the photograph, where it would be otherwise fairly
dark when it is outside of such regions, just barely enough for the amplifier to register. Figure 8, left,
made by Mann, and figure 8, right, made by photographer Chen are examples of this work using long
exposure photographs. [31][32]
Figure 8: Left: Earlier veillametry work by Mann showing the field of view of a photodiode behind camera lens.
The photodiode is placed in front of a camera lens system to emulate the veillance effects from a digital camera.
Right: Veillametry using the very same principle, also using a detachable optical lens system from a camera, made
by Helton Chen, shows a more sparse pattern. [31]
8
There are also examples of previous work that uses similar ideas but with different implementations, such
as the two shown in figure 9 that uses colours to visualize veillance field using Arduino and colored
LEDs. The image on the left, made by an anonymous student, processes and thresholds the light quantity
read by a photoresistor, and if the amount exceeds a certain value, the output color of the LED is set to
green, and red otherwise. The image to the right, made by Yang works similarly as the left, but uses
multiple photoresistors placed as a square array placed closely together. A red LED output indicates that
no photoresistor has passed a predefined threshold value, while blue or violet indicates only a partial
amount of photoresistors in the array satisfied the threshold requirement, while green indicates that all of
the sensors satisfies the threshold requirements.
Figure 9: Left: Arduino implementation of the television and camera experiment, using a coloured LED, with
green indicating sufficient veillance exposure and red otherwise. [32] Right: Similar set-up using multiple
photoresistors, colors indicating the number of photoresistors that have registered sufficient veillance exposure.
Plotting entire veillance fields could be cumbersome using only one LED, so an improvement is made by
having a line of LEDs assembled, each with their own sensors. The veillance field is then only
approximated by replacing the camera with an infrared source, through the optical system, a field of
infrared veillance is emitted into space. The infrared stick is made by coupling a LED output with an
infrared sensor. When the infrared readings exceeds a certain value, the LED it is coupled with will
become bright, achieving a similar effect as the feedback experiment. Figure 10 illustrates two long
exposure photographs as examples of Mann approximating the veillance field of a night vision
surveillance dome camera.
9
Figure 10: Examples of abakographic photographics constructed using the modified SWIM with densely packed
infrared sensor coupled with LED output pairs. The LED will turns on when its coupled sensor registers a sufficient
IR reading. The veillance fields approximates the field of view of the infrared camera system.
Note that abakography and veilliance is not only applied to camera veillance, but has been previously
used to visualize audio waves, radio waves, and others functions that are dependent on space, and
occasionally, time. Waves that are captured as a function of time can be generated from sensors that
measures values by time, like an electrocardiographic plate output, or previous recorded data shown in
figure 13. [31][33] Waves that are captured as a function of space are from systems that reads time
invariant waves, or standing waves, which will be explained in chapter 2, are shown from figure 11 and
12. [31][34]
Figure 11: Examples of waves captured as a function of space rather than time, by capturing time invariant
waveforms explained in chapter 2. [13] Left: Abakographic visualization of electromagnetic waves. The vertical or
green visualizes the real component while the horizontal or magenta visualizes the imaginary component. [31]
Right: an abakographic visualization of radio waves received from a RADAR.
10
Figure 12: Examples of waves captured as a function of space. The two images shows abakographic visualization of
electromagnetic waves waves from a radio transceiver. The two frames differs in the phase offset of 90 degrees. [33]
Figure 13: Examples of waves captured as a function of time. Left: Visualization of Mann’s electrocardiographic
reading over a period of time. [34] Right: Visualization of generated time-variant signal received by a microphone.
1.3 Veillametry applications There are many applications of veillametry, such as gaming, product design, advertising analysis and
consumer behavior or psychology studies, program run-time optimizers, understanding and improving
sensors quality, and surveillance justification, to just name a few.
With the rapid development of virtual reality (VR) gaming and interactions, users are immersed in a
virtual world by having their visual, audio, and possibly haptic inputs from their environments interpreted
by the cameras, sensors and location or tracking markers. In a way, the cameras they wear is their eyes to
the world. In many stealth games, players may wish to avoid the veillance field of guards or cameras
(shown by figure 14, right) if they are playing as a thief or a spy. Conversely, if the player is playing as a
guard, they might want to emit as much veillance onto a thief to maximize their score. In shooter games,
11
as illustrated by figure 14, left, shooters [Pete, right] may emit veillance fields from camera guns, and
others may wish to avoid the exposure of veilliance onto themselves. This is done by using a veillance
dosimeter, such as one wore by the evader [Ryan, left]. The dosimeter measures the accumulated dosage
readout based on their presence or absence in the veillance field of various cameras. [35] With interactive
games, such as many players are dancing to the same song, the maximum veillance exposure from an
audience may indicate superior performance and additional in-game score.
Figure 14: Left: A long-exposure photograph showing Pete (Right) shooting veillance from a camera, while Ryan
(left) is trying to avoid being seen by the camera, or minimize the amount veilliance exposure readings on the
veillance dosimeter that he is wearing. Right: A long exposure photography showing Prof. Steve Mann behind a
mounted surveillance camera, with the camera’s field of view visualized.
Figure 15: Long-exposure photographs showing the field of view (thresholded, binary veillance fields) in the
horizontal cross-sectional plane of a veillance field for various infrared sensors. [13] Left: field of view for three
washroom faucets. Right: the field of view for several urinals.
12
With the ability to clearly measure and model the veillance of various sensors, veillametry when used
with abakography can be used to aid product designers during the sensor selection and placement process.
[22] For example, in figure 15, examples of abakographic photographs reveals the coverage boundaries of
infrared sensors that were placed in hands-free, motion activated faucets (left) and urinals (right). These
boundaries can be used to help verify the field-of-view of the sensors (low resolution infrared sensors)
and their aperture size so that the sensors covers as much space as possible, but does not extend to
neighbouring faucets. The quantified veillance values may also help determine the optimal threshold
values for when the person is too close or too far from the sensor. Another example was shown in figure 1
of a poorly designed antenna placement for a phone, that can be used by both the designer or the users to
get more signal coverage.
Figure 16: Left: Illustration of human bioveillance of the left eye of participant Ryan Janzen. The veillance data is
obtained by non-intrusive eye tests explained in chapter 6 - Bioveillance. Right: an example of typical heatmap
programs found online [36] that estimate human gaze traffic by assuming the eye veillance as a mere circle, ignoring
the rich expressive veillance field of the eye within its wide field of view.
As it will be explained in chapter 6 - Bioveillance, the concept of veilliance can be expanded into
biological sensors, since the human eye veillance are analogous to a camera in many ways in terms of
veillance fields as explained before. In cases where eye patients or cyborgs that uses cameras to enhance
vision, a camera actually acts as their eye. [36] Figure 16, left, illustrates the human veillance of the left
eye of a human participant, R.J.. Veillametrics can be used to create complex and accurate human gaze
heatmaps of where they are looking at. These data is more expressive because it includes features such as
peripheral vision and the data is well quantified rather than an uniform circle or a simple circular model.
Veillance can help extend eye-tracking software to generate better visualization of visual sensory
heatmaps. Figure 16, right illustrates a visual heat map used using an eye tracker, assuming the human
13
veilliance is a mere circle, which is inaccurate. The term sensory attention is introduced in chapter 7, a
way to indicate the amount of attention a sensor has on a particular subject. This is done by studying how
in-focus the subject is, indicated by the degree of blurring detected. Since veillance fields are vectors, they
can be projected and tracked in space to create 3D map models known as veillograms. Veillograms are
very powerful medium to reveal consumer visual interaction process when interacting with a product or
an advertisement. Veillograms can be used for advertising analysis and consumer behavior, and
psychology studies. [37] Figure 17 illustrates an example of a 3D veillogram model (right) and the
surfaces the models represent in reality (left).
Figure 17: Left: A photograph of an audio mixer that is the test subject for veilliane exposure. A participant wears
an eye tracker while adjusting the equalization (EQ) knobs to the left middle of the panel. Right: a 3D render of the
veilliance exposure to the modelled surfaces, rendered in OpenGL.
Another application for bioveillance is program run-time optimizers. For many user interface (UI) heavy
applications such as video games or computer automated design (CAD) software, there are lots of graphic
computation done for all part of the frame, rendering fine details everywhere, even places where the
player or user is not looking. An optimizer could estimate the location of the user’s gaze and render the
level of detail according to the level of veilliance exposure that part of the frame has. [38] This process is
known as foveated imaging. [39][40]
Furthermore, extramissive model of a camera can provide insights on how the sensor operates and
generates what it sees. The understanding of a camera using an emission approach can help explain many
of the imaging phenomenon such as blurring. Figure 18 illustrates the vellion-vixel mappings of adjacent
pixels from a digital camera that has built-in lens with overlapping vixel distribution depending on how in
focus the subject matter is. This in a way causes the pixel readings to be a mix of colour values spatially
with its neighbors, known as optical blurring. [18][41][42] Notice how the blurring kernel matrix (figure
14
18, right) is analogous to one pixel being the superposition of nearby, overlapping pixels. Knowing the
exact distribution of this overlap from a model, an inverse deblurring kernel (similar to a Laplacian
sharpening matrix) may be estimated using linear deconvolution process for image correction under the
immediate setting. Further works in this area using extramissive model creates a new method to model
veillance sensory, with new mathematical formulation of veillance, and information propagation in
sensors and electronic systems. [43] This work will be discussed in chapter 7.
Figure 18: There is a strong relationship between vixel distribution and its mappings, and image phenomenon such
as blurring. [41] Left: Diagram showing vixels and vixel rays each corresponding to a pixel of the camera sensor
array. [18] The size of the vixel expands through space, and if the surface in contact is closer to the camera it would
have a higher vixel density relative to a further surface. Each pixel value is the sum of a distribution of color
intensities of a vixel. These values may overlap, contributing to blurring. Right: A commonly used blurring kernel
matrix found online [42]. The blurred pixel is the sum of a distribution of neighbouring pixels and itself.
1.4 Veillance classification and thesis objective Veillance, or equivalently, metaveillance, is the capability of sensors to sense without any political
context. In this context, Mann have further classified metaveillance to two classes: metaveillographs and
metaveillograms. Figure 19 contrasts the two classes. Metaveillographs are data set that describes the
mask, or the field of view of the sensors obtained photographically using feedback techniques, while
metaveillograms are quantified veillance measurements extended from the metaveillograph masks, shown
on the right side of figure 19. One main objective for this thesis is to extend the metaveillograph
framework shown in the previous works section, to provide quantified veillance measurements through
space, and to add metaveillograms into the veillance framework.
15
Figure 19: Left: Metaveillograph duplicated from figure 3, visualizing the veillance fields captured from previous
work. This work captures the field of view of the camera, but is somewhat inaccurate in terms of the actual
quantities of veillance. Right: Metaveillogram duplicated from chapter 3, where a model of veillance data is
collected and then visualized, showing the veillance field of a camera through space.
1.4 Thesis organization With a better understanding of what veillance and veillametry is through previous work, the need for
quantified metaveillograms are justified. Chapter 2 describes the methodology employed to quantify
veillance flux by measuring veillance data on various modalities, using a precision scanner and a fixed
stimulus source. Chapter 2 also reveals data gathering techniques to maximize data sensitivity by
increasing the data dynamic range using high dynamic range (HDR) methods. [44][45] Chapter 3 will be
focused on the data representation and visualization aspect. The chapter also will verify the optical
veillance model assumptions and attempts to compare results with product specification sheets. In chapter
4, emission theory is introduced in the context of veillance flux modelling, with concepts introduced such
as veillons and vixels. Applying the emission theory, in chapter 5, the methodology to creating a 3D
model of sensory attention emitted onto various surfaces is to be explained. Chapter 6 extends the concept
of veillametry from digital sensors to include complex ones such as our eyes. The veillance of the eye is
approximated using a series of non-intrusive eye tests. Similar methodology may be applied to other
human senses, such as hearing. In chapter 8, the concept of a vixel is to be studied in further detail, and
the veillance model be improved to account for the non-uniform distribution of sensory capacity over the
vixel region. A new mathematical framework on veillametrics are proposed, as a way to model the
propagation of sensing capacities from sensors through space.
16
Chapter 2: Gathering veillance data This chapter describes in detail the veillance data gathering process used for this thesis. The chapter starts
with an explanation of veillance quantity, and then introduces the apparatus used in this setup, and finally
describe the means to gather accurate readings for a photodiode, a camera, and audio transducers.
2.1 Methodology to measure units of veillance As established from before, veillance is the sensor’s inherent ability to perceive a controlled stimulant
source, as a relation of space around the sensor. In these experiments, the level of output of the stimulants
are controlled, and are designed or angled so that the level of stimulant output is fixed with respect to the
sensor input. Veillance, or how well the sensors perceive these fixed stimulants, in this paper, is unitless
for visualization purposes, although it could have units such as joules of energy per centimeters squared
as a irradiance function of radial distance. The relative ability to sense is normalized by the maximum of
all sample points within the scanning range, which are all non-negative. The reason to produce an
unitless, normalized measurement is to simplify comparisons between sensors of different modalities,
such as light energy perceived by photosensors, and wave amplitudes for audio inputs. In other words, the
independent variable is the position of the stimulant, and the dependent variable the relative sensor input
levels for these stimulant locations.
2.2 Generic experimental setup This chapter explores the process of gathering experimental data for microphones, photo-sensors and
cameras, each with their unique differences in setup described in their related sections.
Although there are minor modifications for each of the modality of the data, each explained later in their
own subsections, they all use the general method to move the stimulant around the sensor. The idea is to
have the sensor in use adhered to a fixed location, and then have the stimulant fixed to a moving robotic
part, such as on a 3D printer or robotic arms. The arm would move in a controlled manner in front of the
sensor, recording the intensity of the stimulant at every granularity of motion. Sensor input data would be
collected as the stimulant is moved around in that path. Figure 20 shows the operations diagram for the
data gathering process. The process contains the following main modules: continuous signal generation
module, coordination module, navigation module, receiver sensor module, and data storage module.
17
Figure 20: The general operations diagram of the experiment’s setup. A stimulus source such as a LED or
transducer is moved in front of the sensor in a controlled path using navigation components such as motors. The
location of the stimulus is tracked and its output values at specific known locations are sampled by a controller. The
values at these sampled intervals are then recorded and stored by a sensor connected to storage.
The signal generation module generates a fixed signal that has consistent output and is in phase of the
sampling sensor. The stimulus is also designed to minimize the data difference read caused by varying the
relative angle between the sensor and the stimulus. For example, one of the ways to reduce angle
dependent error is to use a spherical diffuser around a LED, rather than using the LED itself as a stimulus.
(Otherwise the plotter will have to adjust the angle of the stimulant to face the sensor at all times). The
navigation module moves the stimulus across a specific pattern in front of the sensor, and the coordination
module synchronizes the data sampling using timed sampling and/or acknowledgement packets that stops
and resumes the motor from moving. The coordination module ensures that the sampled points are as
spatially uniform as possible. The receiver module stores the sampled data into a parsable file to be
analyzed at a later stage.
Three types of navigation modules were available to the laboratory which these experiments are
conducted. These are the 2D and 3D Cartesian plotters, and a 3D delta plotter. Each with their advantages,
such as data collection speed, plotting resolution, accuracy, reduced recoil, and spatial dimensionality of
the scanner. These plotters are employed to move the stimulant around the sensor for data collection.
18
2.2.1 3-Dimensional cartesian plotter One of the earliest plotter constructed for this purpose was made from motor parts to something similar to
a 3D printer. [46] The plotter is assembled using an Arduino unit, LEDs, plywood, pegboard and printer
parts. As shown in figure 21, a pegboard is used as the base to adhere the power supply, and the plotter.
The plotter itself consists of two sets of orthogonal motor and belt systems in the horizontal plane. By
rotating the motor in the x direction, the entire attached plywood board would shift along the bottom belt.
The plywood motor houses the motor controllers, the vertical motors, and the vertical screw shaft base.
When the z direction motors are in motion, the attached vertical screw shafts will turn and will raise or
lower the metal bridge above the board, depending on the direction of rotation. Finally, the y direction
motor, which are attached to the bridge, controls the position of a plastic housing that moves along the
bridge. The housing houses a RGB LED, which can generate a wide range of colors based on the analog
inputs for its three channels. The colors of the LEDs and the positions of the motors are controlled by the
outputs of the attached Arduino unit, which is connected to a computer through an USB interface.
Figure 21: Labelled photographs of the modified 3D printer, which now positions a print head or a LED to a
specified coordinate. The base of the plotter holds the power supply and the plotter unit. The plotter unit consists of
motors that will move the LED head into specific locations from the connected computer. Through USB the
connected machine can also specify the color output of the RGB LED.
19
The connected computing device runs a program that uses serial interfaces and TCP protocol to ensure
that the data collection and the motor movements are in sync by communicating acknowledgement
packets back and forth. For every coordinate, the data is recorded before a new motor coordinate is sent to
the plotter. Figure 22 demonstrates the fine accuracy of this plotter, by plotting out some wavelets and
chirplet transforms such as a wave, a wavelet, and a chirp shown in the top, using abakography
techniques. [22] A chirplet is shown on the bottom. A chirplet is a wave that varies in frequency as
function of time. [47] The resolution for this plotter is about 0.7 mm per tick in the horizontal plane, and
about 0.05 mm per tick in the vertical direction.
Figure 22: Abakographic photographs of programmed wavelet and chirplets functions plotted in 3D. Top left: a
spiral or a wave. Top middle: a wavelet with a sinusoidal envelope, or a wavelet. Top right: a chirp, which is a
wave that changes in frequency as a function of time. Bottom: monochrome and RGB prints of chirplets.
20
In the actual experiments the LED programming is modified to produce a steady blue colour in the space
in front of the sensors. In spite of the high resolution, this plotter when compared to the others has the
smallest plotting region of about 40 centimeters by 40 centimeters in the horizontal plane and about 35
centimeters in the vertical direction. In addition, the Cartesian printer uses fine threaded screws to lift one
motor system and the LED housing. This causes any motion in the vertical direction to be very slow,
reducing scanning speed significantly when moving the heads.
2.2.2 3-Dimensional delta plotter To address the issue encountered with the Cartesian plotter with the slow vertical access, a delta plotter is
employed. [48] The delta plotter is modified from a 3D delta printer. The main disadvantage of the
Cartesian is that they tend to carry one or more of its moving platforms on another larger platform,
carrying a heavier load and reducing the plotting speed. The delta plotters have the printhead, or the LED,
supported by three identical motor systems aligned in parallel vertically, as shown in figure 23. From the
top view, the three belts are equidistant from the center of the plotter, forming an equilateral triangle.
Each belt is attached to an arm holder that holds metallic sticks, connected to a small platform, which
holds the stimulus unit.
Figure 23: Left: a photograph of a delta 3D printer used in the experiments, a small platform is added above the
plotter area to position test sensors. Right: A 3D delta printer model sourced from online [49]. The model shows the
mechanical working principles of the printer.
21
Examining the working principles of the plotter, assuming that all but one of the three rods are connected
to the plate, the plate can move freely about a partial spherical surface around the location of the holder
when the holder is held stationary. The radius of the sphere is the length of the rods. Now, having two of
the arms attached will allow the platform to be moved in a spherical arc on that surface, that is equidistant
to both holders. Now adding in the last holder will constrain the line of locations down to two singular
points of that arc. As the arms are rigid, the upper position is disregarded. The employed algorithm uses
basic trigonometry to position the housing to a specific coordinate by computing the new location of the
holders, one by one. The locations are simply one rod’s length above the holder intersecting the motor
belts. The displacement is translated to motor movement instructions.
Figure 24, left shows a sketch of the delta plotter. A triangular platform is attached on top of the moving
stimulus to hold and secure the sensors. During the experiment, the sensor will be held stationary, while
the stimulus is swept across the space below. The right side figure shows a chirplet produced by the delta
plotter, showing the fine accuracy of the plotter. The belts are operated by stepper motors similar to the
previous setup, with approximately 0.7 mm per tick. This plotter has a confined triangular prism region
where the printhead can travel to, about 20 centimeters side length and a height of 50 centimeters.
Figure 24: Left: a diagram (with front, top-side, and top view) of a delta 3d printer used in the experiments. A
small platform is added above the plotter to hold the sensors stationary. The hanging platform is able to hold
multiple transducers with adjustable spacing. Right: an abakographic photograph of the delta plotter in motion,
producing a light trail of a wavelet through space.
22
2.2.3 2-Dimensional cartesian plotter Due to the symmetric behaviour of veillance fields from most everyday sensors, the limited scanning
space, and the slow scanning rate for three dimensional plotters, a two dimensional plotter is also created
to address these challenges and produce high accuracy (6000 steps high by 8000 steps wide with the
plottable area of 65.5 cm by 49.5 cm), time-efficient data. The 2D cartesian plotter is a simplified version
of the 3D plotter outlined in previous section, as shown in figure 25, left. Similar to the other plotters, the
stimulus platform is attached to a string with its location controllable by a motor. The vertical platform is
in turn movable in the horizontal axis by another string controlled by the x direction motor as labelled.
Various sensor mounts are positioned on the left edge of the plotter.
Figure 25: Left: a photograph of the 2 dimensional plotter used for data collection. Right: angled and front view of
3D models of the plotter. A television screen can be mounted onto the plotter to show visualizations as the data is
being collected. The screen is not shown to the photograph to the left.
23
2.3 Gathering photodiode veillance data A simplified study of video veillance is conducted to understand how a single pixel camera (one
photodiode) behaves veillametrically when placed inside a surveillance dome, identical as the one shown
in figure 26, left. When placed in a surveillance dome, the veillance of the photodiode is subject to the
optics and obstructions introduced by the dome, as the original camera would. Single photodiode sensors
are commonly found in proximity sensors used in bathrooms, such as handless sinks and toilets, although
they might be placed behind miniature shields instead. [31][33]
The experiment is conducted inside a darkroom, using the two dimensional plotter. A LED is positioned
in the corner of the plotter, with the plotter aligned vertically to the central optical axes of the camera, as
much as possible, with an error less than 5 millimeters. The LED is moved robotically in vertical strides,
producing a zig-zag pattern, while a microcontroller is programmed to integrate the sum of output of the
photodiode connected to an ADC (analog to digital converter) over the sampled time to produce an
integer value. Due to time constraints, data points are sampled to a resolution of 300 points to 400 points,
to be consistent to the aspect ratio of the plotting area. This data is then normalized, visualized and
rescaled as figure 26, right. For more details on data modelling, visualization and analysis, please consult
chapter 3 - Data visualization and analysis.
Figure 26: Left: a photograph of the camera dome used for the photodiode housing. The photodiode replaced the
original camera to create a new single-pixel camera system that simulates the effects of the original interior camera.
Right: a veillance visualization of the photodiode camera, as shown to the left, taken from later chapter 3 -
“visualizing and analyzing veillance data”.
24
2.4 Gathering camera veillance data Next, the veillance data of a selected video camera is to be captured. Specifically, a Blackfly camera unit,
manufactured from Point Grey, mounted to the side of the plotter is the test subject of the experiment. As
the only independent variable of the experiment should be the position of the stimulant, the camera is
configured to have its auto-exposure, white-balance, auto-focusing, anti-aliasing, auto-correct, anti-shake,
and various other automation features disabled or set to manual. The camera is configured to be as closely
as a light meter to measure photo quantities for each one of its pixels as possible. In this experiment, each
frame of the video is 640 by 480 pixels and set to a constant exposure time. Each pixel reading represent
the cumulative veillance value during the exposure time over its individual vixel.
Since that there is a variable amount of delay for taking successive frames from the video camera,
sampling the data with fixed time intervals would no longer guarantee an uniform sample spread in the
space front of the sensor, as is the case with the photodiode. A server-client acknowledgement system is
therefore implemented using TCP standards to address this issue. The plotter, or the server, would move
the LED head to the precomputed position before sending an acknowledgement to the client which would
then take the frame and persist it to memory. When that is done a backward acknowledgement is sent so
the motor system, which would move the printhead to the next location. The communications will
continue until all required samples is read and recorded to disk for further analysis. Due to noise present
in some of the data, some of the data recorded is the average of multiple tests, typically between 1-3 runs,
depending on the data’s signal to noise ratio.
2.4.1 High dynamic range (HDR) imaging techniques
One of the main challenges in obtaining reliable estimates on the quantity of light, or photoquantity, of a
scene from a digital camera is that there is a monotonic, nonlinear camera response function to light and
its output, in terms of pixel value (which is 8-bits per channel for most digital cameras). In other words,
the pixel values from an image does not correctly describe photoquantity or veillance power, and is a
compressed function of photoquantity, which needs be remapped.
Furthermore, often when visually looking at the environment through a camera or from human eyes, the
excessive amount of light present in the scene causes overexposure (shown as figure 27, middle), or the
lack of light causing under exposure (shown as figure 27, left). In either case critical information is lost
due to low dynamic range of the input frames, whether from eyes or from the camera. [50]
25
Figure 27: Three images of the same scene, containing objects such as chalks and screws, and chalk-written word
“Nessie” on the foreground. Left: photograph taken under low exposure settings, darker details and information that
requires longer exposure time is not captured, such as the darker details of the screw, and details on the shade side of
the chalks. Middle: photograph taken under high exposure setting, revealing darker details, however the
overexposure floods the well-lit regions of the scene, causing lighter details such as the letters and the bright side of
the chalks to become indistinguishable white glob. Right: High dynamic range image that is a composite of a
multitude of exposures, that reveals details from both the light and the dark regions of the scene.
In an extremely lowly exposed image, most of the data is compressed within the lower pixel values. In the
extreme case even the brightest spots might only be one pixel value higher than the darkest pixel, and a
lot of the information is lost in the discretization after mapping the photoquantity into a pixel value. In
this case the dynamic range of the photo is low (or binary in the extreme example). To expand the range
of the photoquantities of the image, a higher exposure may be applied to the image to reveal additional
detail. However, overexposure of an image also causes loss of information. Consider the other extreme
case where all the pixels are extremely lit due to long exposure time, and there is no longer any
differentiation between the bright and dark subjects inside the scene. Overexposed images have also lower
dynamic range, as shown in the earlier example with the chalks and its shadows, shown in figure 27.
Proposed HDR method employed should aim to eliminate data loss, increase data range and estimate
photoquantity mapping. [44][45]
Formally, dynamic range of an image is defined as the ratio between the largest quantity that will not
overexpose a sensor, and the smallest positive quantity for which changes in the quantity remain
discernible. [51] This chapter will describe some of the techniques used to increase the dynamic range of
the data. As a byproduct, the mapping relationship between photoquantities and the pixel values obtained.
This is done by taking multiple images of the identical subject matter, but with each image a constant
multiple times exposure as its subsequent. The details from light and dark exposures are extracted from
26
the various exposures, and after recompressing the values into light quantities, or what is referred to as
lightspace (where the compressed data is referred to as image space), the various details are recombined
and weighted. The estimated values of photoquantities mapped back to pixel values using a compression
algorithm is known as the HDR reconstruction. [44]
Before explaining the algorithm, some background information is in order. Cameras do not produce linear
output based on its perceived photoquantity (the amount of light present at a scene) as cameras usually
have dynamic range compressors built-in. There are many reasons to have this compressor, such as to
accommodate for wide range of lighting and colors in various scenes. Historically it was built to
accommodate non-linear expansions that happens on cathode ray displays. Instead of correcting the effect
by building circuits on the televisions, the circuit was built on cameras to compress the image to be
expanded on the television so that the color is approximated with better accuracy on the cathode
televisions. [52] Figure 28 shows the compression and expansion process for photoquantities. The
compression function is also known as the camera response function (CRF). The CRF are device
dependent functions to individual cameras and camera settings that maps the photoquantity (represented
as variable q) present to pixel values. Even when using the same camera, but with different aperture or
other optics setting changes, the CRF would have changed. In the figure the CRF is labelled as
compressor, and has the function operator, f. f(q) represents the pixel output value of the camera after
compression. The inverse function is to be derived if the true photoquantities are to be approximated.
Figure 28: Diagram showing the signal processings of a camera and a cathode ray tube. [53] The amount of light on
a subject matter perceived by a camera through its optical system is the photoquantity, q. The photoquantity is then
compressed by the camera to be stored and/or displayed as pixel values f(q) after going through the compression
(camera response) function, f. The data needs be expanded by a similar function to f inverse to be have the image
properly seen on a display.
27
The compressor function (simple version), f, as a function of photoquantity is proposed as:
where a and c are constants. [53]
Given multiple frames of the identical subject matter, with each frame a multiple times of exposure
compared to the previous frame, the response function can be derived using comparametric analysis. A
comparagram is a graph where a function is plotted against the same function. In this case, one frame is
compared to another frame with k times more exposure (k is set to 2 for this chapter), in other words,
plotting f(q) against f(2q). The comparametric equation g(f(q)) = f(kq) describes the curve of the
comparagram.
Figure 29 shows a partial comparagram generated from two images varying in exposure. The
comparagram is a histogram on the occurrences when the output of one image, f(q) corresponds to a
typically higher or equal pixel value when the exposure is doubled, in other words g(f(q)) or f(kq). The
comparagram is a discrete table that is 256 by 256 (for the 8 bit channels camera), and is populated by the
number of pixels of the images per color channel. For example, looking at figure 29, say the particular
pixel under the green crosshair has the value 58 (for the blue channel) on the lower exposed photograph,
and 82 for the highly exposed photograph, then the element (58, 82) on the comparagram table would be
incremented by one. This is then done for every pixel of the two images. The process is repeated for every
colour channel of the image. If the HDR involves multiple exposures greater than 2, then this computation
would be done for every two consecutive exposures.
Figure 29: A comparagram [256x256 pixels] composed from the two images to the left. The comparagram is a
histogram of all the pixel values of the images. The horizontal index is the pixel value of the low exposure image
and the vertical index is the pixel value of the high exposure image.
28
Given the constraint g(f) = f(kq), and the proposed photoquantity model above, one possible solution of g
with respect to f can be derived as:
Where q is the photoquantity recorded by sensors and is known to be positive real value. [53] The model
has two degrees of freedom, a and c. Using numerical techniques such as gradient descent, the values of a
and c of the model can be computed such that the root mean square error of the comparagram’s points are
minimized. Figure 30 shows a complete comparagram, with the fitted curve drawn in red. Note that the
histogram illustrated is normalized by the maximum entry.
Figure 30: A fitted comparagram of a series of images taken outdoors. The comparagram has one sample point
incremented for every time a pixel value of lower exposure corresponds to another value of the higher exposure. The
points are normalized and fitted to produce values of a and c to minimize data error.
With the values to a and c computed, the compression function is complete. How the camera responds to
photoquantity and maps it into pixels becomes also clear. And to estimate the photoquantity present in the
photograph, the inverse function can be rearranged as:
However, as seen in figure 30, it is apparent that there is a large amount of over-exposure in the frames
that are selected. This can be seen by the steep curve, and with the fitted table saturating nearly halfway.
Selecting the appropriate frames is essential for the algorithm to work well, but sometimes due to the
nature of the subject matter, some regions might always be extremely under or overexposed. To account
29
for this, the algorithm will also incorporate a certainty component to the photoquantity estimation.
Consider a portion of an overexposed image, where the output of the image is always saturated at 255. A
significant change in the amount of photoquantity in these regions may result in little to no change in the
amount of pixel value the camera outputs, and therefore the camera as an instrument works poorly and
has little certainly on its predictions of photoquantity value. Although there are numerous advanced
algorithms to compute this certainty, [44][45] one model that estimates certainty, c(q) using the rate of
change of the pixel output, f(q) is as:
As shown in figure 31, from Mann’s Intelligent Image Processing textbook, [53] the certainty of the
prediction is highest when the rate of change in pixel value prediction is the maximum. At regions of the
comparagram where the pixel value barely change when photoquantities are significantly changing, there
is high amount of recovery error when reversing this mapping, leading to low certainty. Note that the
figure is on a logarithmic scale.
Figure 31: Logarithmic graphs showing the relationship between the quantity of light, or photoquantity, and pixel
value produced by a digital camera. The compressor function is a nonlinear nondecreasing function. The certainty
function of the compressor function is simply the derivative of the response function against photoquantity - at
regions where the pixel value change rapidly with changing level of photoquantity the sensor is more sensitive to
change and therefore more certain of that change, where the derivative is very small at extreme exposure regions.
30
Now to put the pieces together. After the photoquantity is estimated as the inverse compression function
of the image pixel value, the result is multiplied by the certainty function derived, and then scaling is
applied to the photo quantities depending on the exposure of that image. For example, assuming image B
has twice the exposure compared to image A, then the resultant photoquantity of image B would have to
be halved to be directly comparable to image A, since image B theoretically have twice the photoquantity
from twice the exposure time. Lastly the individual frame estimates are summed and normalized to
produce the cumulative estimate. This is shown on the left half of figure 32.
Figure 32: Illustration summarizing the HDR process for photoquantity estimation, analysis, and recompression for
display purposes. For each individual image that is a multiple exposure time from its predecessor, the camera
perceives the photoquanty and interprets a compressed pixel value. The value needs be expanded by deriving its
inverse response function using comparametric analysis. The amplitude of the photoquanties then is adjusted by the
exposure time to make the image comparable to others, and finally summed and normalized. The summed data is
needed for veillametry analysis, but needed be recompressed back for visualization purposes.
31
For veillance studies, photoquantities are the desired measurement for analysis as it is more descriptive of
the amount of light than pixel values, it has also a much higher dynamic range. However, this data would
be difficult to visualize as most image formats supports only 8 bit data. In order to show these data as
colormaps, these photoquantities will need to be remapped or recompressed to low dynamic range, to
accommodate display input requirements, after the required operations are completed in lightspace.
Figure 32 illustrates the entire HDR process when processing video veillance data. Although the diagram
only shows steps for three, this process can be applied to any arbitrary number of images. Please note in
this diagram that k1, k2, and k3 are geometric sequences representing image exposure level, if k1 starts at
exposure value q0 then k2 would be 2q0, and k3 would be 4q0, so on and so forth (factor of 2s). The c
operator refers to the certainly function, which is multiplies to photoquantity estimates. For the reviewers’
reading interests, there are papers on advanced statistical models for certainty estimation, [44] methods to
generate compressor models that produce finer color tuning, [54] and FPGA HDR implementations that
aims to optimize runtime. [55]
Figure 33 is meant as an illustration to show the intermediate results for a similar HDR method: recursive
pairwise comparametric image compositing. Instead of summing or averaging the photoquantities, an
intermediate step is produced for every two consecutive images, and each iteration of the program reduces
the number of frames by one, so the process will terminate in n-1 cycles where n is the number of
exposures. The top row of the figure shows 8 images differing in exposure only, and using the expansion
functions to obtain intermediate results, and recompressed to produce the corresponding thumbnail. Note
that this method works well only for showing the intermediate steps, giving the viewer an intuitive sense
of how the algorithm works step by step. For greater computational efficiency and possibly data accuracy,
the method described earlier in this chapter is recommended. Note that there are some artifacts that are
present in these thumbnails which shows as outlines on the waves, which is caused mostly due to the lack
of tonal mapping implementation in the HDR program. [54]
Notice in the first row of the figure, the raw exposures shows underexposed to overexposed images from
left to right. The underexposed images features a small dynamic range as only the brightest parts of the
scene is captured, while the overexposed frame created more or less a mask of the veillance field.
Decompressing the pixel values into photoquantities in a pairwise manner allowed the photoquantity
values themselves to be combined. Details of from all the exposures are accounted and adjusted by
exposure weight when they are combined. The details accumulates through this process, visible from each
increasing row of the figure. Due to the monotonic assumptions of the compression model used, the order
of the brightness is maintained throughout the frames.
32
Figure 33: HDR composite of a multiple exposures of a LED collected by a photosensor, using the recursive
pairwise comparametric image compositing algorithm. This algorithm uses comparametrics to estimate light
quantities, which are a higher dynamic range than sensor interpretations as images. The HDR image preserves the
order of brightness as the response model function is monotonically non-decreasing function.
Notice the final image, where the effect of the HDR is much more apparent; the change in light intensity
is much more gradual, with significant reduction in over and over exposure. Notice that the order of
brightness is preserved from the original images. There is also little to no loss in details, both from the
low and high exposure frames.
33
2.4.2 Simplified HDR method during runtime
Given that the experiment is conducted in a darkroom with no other light source, the environmental
background lighting is negligible. A lightweight alternative HDR method is proposed and used
extensively, to expand the data’s dynamic range, and to eliminate overexposure in the photos, given that
at least one photo is not over exposed, and the camera response function is known from the previous
method (the parameters a and c are already computed for this particular camera setting).
Algorithm:
1. Obtain multiple exposures of the subject matter. Consecutive frames’ exposure time need to be
geometric series. At least one image is not over exposed.
2. Create an empty array with the same dimensions as the images. Set all entries to -1.
3. Start with the most exposed image, traverse every pixel and color channels. If the corresponding
array entry has a non-negative value then continue on to the next element. Otherwise:
a. If the pixel value is greater than some high exposure threshold, where the data becomes
more uncertain (for example, value > 200). In this case the data in the immediate
exposure is uncertain and is discarded, the -1 entry is unaffected.
b. Otherwise when the pixel value is smaller than the preset threshold: a value is written into
the entry, the value is the product of the pixel value, and the reciprocal exposure ratio this
image has against the baseline image, which is the lowest exposure.
4. Repeat 3 with the lower exposure image, and continue all the way until and including the baseline
image. Thresholds may be adjusted if there still exists pixels of -1 at the end of all frames. An
error is produced when all frames are over exposed.
5. The resultant image, once normalized and decompressed, is the HDR composite.
This method extends the earlier reconstruction method, and attempts to simplify the extensive repetitive
computation of curve fitting the camera response function by gradient descent, so this lengthy process is
done only once per camera setting. Instead, it will directly estimate the photoquantities by a discretized
(0-255) look up table (LUT). However, steps are taken to ensure the pixels used in the table are within a
region of model certainty, as much as possible, before using pixels that are outside such range. The
advantage of this implementation is that decompression runs relatively more quickly, as long as there is a
good spread of the exposures captured. Furthermore, this method better eliminates uncertain data.
34
2.5 Gathering audio veillance data Lastly, the audio veillance data is to be measured. Unlike photoquantities, audio signals travels in wider,
slower decaying waves. Although the amplitude of the waves are the focus of the study, the phase of the
wave as a function of space might also be interesting for visualization purposes.
2.5.1 Experimental setup Using the two dimensional plotter, as previously discussed, a few additions are included to the setup to
measure the amplitude and phase of sound waves as a function of space. As shown in figure 34, a 40
kilohertz sine wave is sent from a signal generator to a transducer positioned on the moving platform of
the plotter. On the other end of the plotter, another transducer is mounted on the side, acting as a
stationary receiver.
Figure 34: [15] experimental setup for audio veillography. This setup allows both the amplitude and phase
information to be recorded by a machine, using a lock-in amplifier (LIA), low-pass filter (LPF) and an analog to
digital converter (ADC). The recorded data will be as a function of space rather than time. A connected computer
can use the real and imaginary vectors of the perceived wave to produce a color output using connected RGB for
debugging or abakography.
35
The output of the receiver is connected to the signal input of a lock-in amplifier (LIA), with the reference
input from the same signal generator. The real and imaginary components of the wave produced from the
LIA is connected to a low pass filter (LPF) and then an analog to digital converter (ADC), and then
finally into a connected machine. The machine sends two sets of data to the plotter - the plotter housing
location, as discussed earlier, and RGB values for an LED attached near the moving transducer. The LED
system mainly acts as a real time debugging tool to display audio information visually, by producing
abakographic photographs.
2.5.2 Time invariant waves - “Sitting waves” The sitting waves are wave phenomenon coined by Mann that is time invariant, and only a function of
space. [56] Consider a generic radio or sound wave travelling along the direction x, at time t, where w is
the natural frequency of the wave in radians, A (wave amplitude), and k (spatial sensitivity) constants:
Assuming that this is the origin reference of the signal, with x = 0. Simplifying a(t,0):
It is apparent that the wave function a is still a function of time. If the location of the receiver is fixed then
the signal received would oscillate as a function of time. The signal received by the receiver, r(t,x) would
be the same as a(t,x):
By feeding these two signals into a lock-in amplifier, these signals are effectively mixed, and producing a
new signal that can be generalized as:
Using the trigonometric identity:
Where A’ is a constant, which can be rescaled to produce an unit amplitude. Now passing the signal s(t,x)
into a low pass filter, the high frequency component at 40kHz would be eliminated. Leaving the output:
36
Which is a time invariant wave, the amplitude of the signal measured would only be a function of space.
The setup above allows precise amplitude and phase of the wave computed and recorded as real and
imaginary components. The wavelength of the waves can be measured as the receiver is moved from one
peak to the next. Figure 35 shows an abakographic photograph of this setup to measure the speed of sound
and measured as 350 m/s, which is only a 0.86% error from the ground truth at a room temperature of 27
degrees celsius. [57]
Figure 35: An abakographic photograph showing amplitude of a time invariant sound wave. [57] The wavelength of
these waves can be measured by moving the SWIM device across a path of sound propagation noting the peaks. And
using this information the speed of the sound wave can be confirmed with good accuracy.
2.6 Summary In this chapter, the various laboratory setup for measuring multiple modes of veillance are discussed.
Multiple plotting tools and their working principles are discussed. Furthermore, the concept of
photoquantity and high dynamic range image processing is discussed. Lastly, time invariant sitting waves
are introduced to measure wave veillance as only a function of space.
37
Chapter 3: Data visualization and analysis This chapter will explain the color mapping procedure for visualizing veillance data, turning a large
matrix array into an image that is understandable by humans. The images produced should also be easy to
compare between the different modalities, and/or different products of the same modality. The mapping
process will be discussed in detail, and the various data obtained from the previous chapter is shown. This
chapter is organized into two sections, one for illustrating video data, and the other one for audio. The
chapter includes data analysis and validation as an addition as well.
3.1 Visualizing video veillance data This section explores the visualization of video veillance data. The topic on how raw data is interpreted
and converted to an image is also covered, with added examples as well.
3.1.1 Data preparations and color maps As explained in earlier chapters, photoquantities are estimated from the images captured from a camera,
using the simplified HDR approach, the data produced is of a higher dynamic range and needs to be
reduced to 8 bits per channel before it can be visualized on monitors. Given the estimated, normalized
photoquantities, the recompressed image value is computed as:
For some constant values of a and c derived from earlier work in previous chapters. The previous function
results in a real number between 0 and 1 for all photoquantities which are positive real. These numbers
are then applied to a map from OpenCV library as shown in figure 36 to produce a color mapped image.
Figure 36: Figure showing the schema of two of OpenCV’s default colour maps. COLORMAP_HOT is the
mapping used for visualizing veillance data, where COLORMAP_JET is used for earlier audio veillance work. The
colormap function takes the input and returns a colored pixel. Low photoquantities will result in a mapping closer to
0 [black], middle levels of photoquantities will map to orange, and very high photoquanties will be mapped to white.
38
3.1.2 Video camera veillance The color mapped image resulting from earlier measurements from the previous chapter is shown below
in figure 37. Note that this image has the resolution of 250 pixels wide and 200 pixels high. Each pixel
represent the physical location of the LED on the moving platform of the 2D plotter, and therefore the
plotter has traversed 250 vertical strides, with 200 samples on each stride. The photoquantity seen is the
sum of all the pixels of the camera, to establish a general estimate of the entire camera’s veillance field. In
later chapters, the veillance field of individual camera pixels will be studied.
Figure 37: computer generated visualization of the veillance field for PointGrey BlackFly camera, based on data
obtained from a 2D plotter.
To expand veillance visualization to three dimensions, the delta plotter introduced in section 2.2.2 is
employed. A digital camera is placed underneath the plotter, and a LED is swept across the space above
the camera. The scanning density is reduced since the detailed data would have taken an extremely long
time to gather. Figure 38 illustrates a scatterplot of veillance data points that are greater than some
threshold value, to avoid excessive points plotted, which can also obstruct more informative data to be
visible. The color mapping of the points have 0 for the red and blue channel, and the green pixel value is
39
the photoquantity of that point against the brightest spot in the entire dataset, multiplied by the color
depth. This generates a 3D point cloud with varying hue of green. Note that only a portion of the veillance
field is shown - the region where the plotter could reach. The figure below is shown at an angle with
respect to the orientation of the camera at roughly 65 degrees. Thin red lines are overlaid to the figure to
help identify the corners of the veillance field of the camera.
Figure 38: a 3D point cloud of veillance data collected for a digital camera using the delta plotter. The green points
indicates the level of veillance (dark green is low veillance and bright green correspond to higher veillance values).
The red lines are draw to help indicate the edges of the field of view of the camera.
3.1.3 Analyzing photodiode veillance data with optics model This section in particular examines the data obtained for photosensors in a camera system. This chapter
aims to generalize and model the photosensor data into a quantified formulation. This chapter attempts to
establish the quantified relationship between veillance power and space. Although not explicitly shown in
the analysis, similar computations can be applied for modelling audio veillance.
Given the camera veillance data gathered, the visualizations of the data indicates that veillance is
propagated from the source, and degenerates as a function of distance. Therefore, it would make more
sense to model veillance in polar coordinates. Since the camera has discrete, directional sense, veillance
power can be described by a vector from the source. There are three degrees of freedom in these models,
40
x, y, and z coordinates of a vector from the source in cartesian coordinate, or distance (r), the vertical
component angle of the vector (φ), and the horizontal component angle (𝛳).
Starting with figure 39, the visualization of the experiment described from earlier chapters. The
experiment consists of a photodiode placed inside a camera casing, and the stimulus is a diffused LED
that is swept in front of the sensor.
Figure 39: a 2D veillance visualization of a photodiode exposed to a LED stimulus, side cross-section view. The
row number of the sensor axis used for analysis is shown on the corner, with the row colored in blue.
From the figure showing the side cross-section of the veillance field, the principal (optical) axis of the
camera is selected for studying. Since veillance field have mostly identical optical properties such
intensity degeneracy, the hypothesis is that the veillance power will decrease to the inverse square of the
distance away from its source. This is because the surface area for the same amount of veillance exposure
increases proportional to the square of the distance away from its source. Having measured the
dimensions of the plotting range, and the number of pixels recovered horizontally during the scan, the
relationship between pixels and physical measurement in centimeters is obtained. The veillance power,
P(r), where r is the radius in centimeters, is proposed as:
41
Where k, q, A, and c are constants, and the singularity point A/q which P(r) is undefined the exact source
of veillance. Using gradient descent techniques, the model is generated to fit as well as possible to the
data points. The model is shown in figure 40 as a red curve, while the data points are drawn as blue
triangles.
Figure 40: a plot of the veillon strength as a function of receiver’s radial distance from the source. The inverse
square model is shown as red line and the blue squares represents the data points of the model. Each data point is a
pixel from the line shown in figure 39.
Computing the model error E(model, prediction) as:
Where all the data and model represents positive values of photoquantities. The relative error between the
model and data is at 24.84%. Looking closer at the data, there is a predictable zigzag pattern as shown in
figure 41 causing this prediction error to be much higher than expected.
There are multiple possible sources to this relative error, such as coarse sampling resolution, or directional
asymmetry of the plotter, which may have caused the sensor to register different values going from one
way versus the other. Given this highly repetitive pattern, it makes sense to accommodate for this data and
come up with a more optimistic error model:
42
Which yields an optimistic relative error of 1.41%. Visually and statistically, this model is somewhat
accurate to predict the veillance power as a radial function away on the optical axis. But will this model
hold when the vertical angle (φ) changes? To test where this model holds, two more lines are selected
from near the center as test subjects, illustrated in figure 42.
. Figure 41: A closer look at the distance versus veillon reading plot (figure 40). Where now the sample data is shown as blue
line-connected points. The data indicates a very persistent and predictable zig-zag pattern.
Figure 42: Two more lines of data is used to test the accuracy of the model established. These two lines originates
from the predicted singularity point (sensor location) and has the slope of 1 and -1.
43
The two selected lines originates from the predicted singularity point of the model, and has the slope of 1
and -1. There are two reasons for these choices. First, it is much easier to extract the diagonal data from
the original matrix, and secondly, these angles are two extremes from the optical axis, and it's not too
much that they leave the field of view of the camera.
To select the data starting with the index q, where q is row number where the left side intersects, and p =
0, and (q, p) would be the first point. Subtracting 1 from q and adding it to p would result in the index to
the next point, the points are generated until q is 0. Similarly the negative sloped line can be obtained just
like the positive one with reversed addition operations. Figure 43 shows the fitted curve as the red line,
and the blue line represents the data points of the positive sloped line. The model established has about
14.9% error, mostly due to inaccurate estimation of the sensor center, which is in fact about 1-2
centimeters away from the model prediction. To conserve space, graphs for the other slope is not shown
here; the model error for the other line is approximately 18.1%.
Figure 43: the plot of veillance strength measured versus the receiver distance at 45 degrees above the optical axis.
The model still holds well to the data points (14.9% error), proving that there is much more data dependency on the
radial distance rather than the angle of the vector in question.
44
It is fairly safe to say, with the data from 3 different runs, the model holds for this particular camera, with
the input solely a function of the radial distance between the sensor and the receiver on that vertical plane.
Now that it is established that the veilliance strength is somewhat independent on the vertical angle (φ),
but can the same be said to the horizontal angle, (𝜃)?
In order to answer the above question, another set of these lengthy experiments needs to be done with the
camera rotated 90 degrees to reverse the role of theta and phi. However, due to the symmetrical nature of
the camera and its optics, perhaps the question can be answered qualitatively than quantitatively. Figure
44 illustrates another run with the same camera, but the camera is placed in front of the plotter, on a
tripod. The camera looks into the plotted regions of the plotter from the front. The left side figure shows
the pixel average of the LED during the entire run, leaving a fine threaded path of light. The right side
image is a computer generated image, with each pixel representing the sum of photoquanties seen by the
camera at that stimulus location. Notice the decay in veillance strength is more spherical than it is oval.
Although this might not be the case for every camera, this evidence indeed confirms the hypothesis that
the veillance power is solely dependent on the radial distance between the stimulus and the receiver, and
have very little to no dependence on the angle, as long at the direction falls inside the field of view of the
camera. This conclusion applied to only the camera tested, and needs more data to generalize all cameras.
Figure 44: Left: a computer generated image showing the entire LED path as it was swept in front of a camera,
which is the subject of the veillance study. Note that there is some barrel distortion from the camera. Right: a
computer generated image where each pixel represents the sum of all photoquantities perceived by the camera from
the LED, at a particular location of the LED. The pixels have the same order as the LED locations. The pixels are
normalized to the highest photoquantity before being colour mapped. The veillance map visually indicates a strong
relationship between the veillance power and the radial distance.
45
Therefore, the above section confirmed the inverse square law between the veillance power of a sensor,
and the radial distance such sensor is from the stimulant. There is also very little effect in changing the
model when the stimulus moves away from the optical axis, horizontally, vertically, or both.
Conclusively, the video camera’s veillance fields can be thought of as a collection of vectors uniformly
distributed (disregarding barrel and pincushion effects of the camera) inside the field of view of the
camera, each propagating a veillons outwards towards space with the same amount of power, but the
amplitude of this power degenerates proportional to the inverse square of the distance between the sensor
and the stimulant. This finding is based on qualitative and quantitative observations of veillance power
recorded.
As veillametry is still an active field of research, the detailed ability to sense stimulus through space is not
well documented in literature review, or an aspect of technical specifications for cameras, such as the one
found for the Blackfly camera employed. [58] For cameras with adjustable focus length, or sensor-lens
separation distance, the field of view is adjustable by the sensor-lens distance and therefore a variable
value that is not outlined in specification sheets. [59] Figure 45 shows one instance of field of view
visualized for one optical systems setting. Even with cameras with fixed field of view, it is likely
described as an angle specifying the mask of veillance, which is analogous to metaveillograph done in
previous works that are quantitatively inaccurate versions of the quantified metaveillogram.
Figure 45: the field of view of an optical system is dependent on multiple factors, such as sensor size, and
sensor-lens separation distance. The figure is sourced from online source. [58]
46
3.2 Visualizing audio veillance data Similar to the video veillance, this section will explain how the real and imaginary components of the
wave data is interpreted to image format, and applying these techniques to previously recorded data. The
section will also illustrate the effects of wave interference.
3.2.1 Data preparations and color maps Given the real and imaginary components produced from a lock-in amplifier, further processing is carried
out before data can be mapped into an image. The information which needs be mapped are the relative
magnitude and the phase of the wave received at a particular point in space. The magnitude of the wave,
represented as M, as a function of the real (r) and imaginary (i) components, is computed as:
And the phase of the wave 𝜃(r,i) is computed as:
When mapping these data points into a color map, the magnitude component determines how much
interpolation between black (RGB = 0,0,0) and the color that corresponds to the phase of the data point.
For example, if the magnitude is near 0 then the pixel will appear very dark, and close to being black. The
phase itself determines what color the pixel has before correcting for its magnitude.
Figure 46: abakographic overlay of audio veillance data on top of the plotting apparatus. The veillance waves in test
shown are fairly low in frequency, and is directed towards the speaker. [15]
47
Since audio waves are sinusoidal, the colour mapping also needs to be periodic. The same colour mapping
schemes in figure 39 no longer applies, and the mapping will need to be more of a colour wheel than a
colour bar. Figure 46 shows one of the possible mappings. Although continuous, the spectrum is too
heavy on showing blue and orange, while the changes in areas of green and yellow are way too rapid. One
of the ways to generate colourwheels is to specify exactly what colour should certain phase angles hold,
and intermediate values an interpolated value in between these customized values. By spreading the main
primary and secondary colours as evenly as possible, an improved colour wheel is generated as shown in
figure 47. The figure shows a program that allows the user to set the angles in which some colour must
occur at.
Figure 47: a program written to generate colour wheels, or colour look-up tables that optimizes in the colour spread
of the audio data. This is done by hard coding certain phases with specified colours. The phases are specified
relative to red, which sits at 0 degrees. Any in-between values are interpolated from the two nearest colours.
48
3.2.2 Audio veillance Applying the above preparations, various visualizations are generated and are shown as figures 48 to 51,
with detailed captions. Some of the visualizations also shows the constructive and destructive interference
patterns of waves.
Figure 48: [15] Left: abakographic overlay of the signal received by two microphones connected in parallel, while a
speaker is transmitting sinusoidal waves while moving across the space in front of the sensors. Right: similar setup
with 6 transducers connected in parallel. These two figures shows the interference nature of waves, as lines of nodes
(constructive interference) and destructive antinodes are clearly visible on the overlay.
Figure 49: Veillance renders of the identical audio data, but the second frame has an angular offset of 18 degrees
from the first image. The third image has an angular offset of 18 degrees from the second frame. Iterating the phase
offsets and saving them as frames can be used to generate a graphical interchange format (GIF) image to animate the
waves given only one set of data. The successive frames when animated gives the illusion of a moving wave.
49
Figure 50: computer generated visualization of a high quality microphone, using a slightly modified colour wheel.
The microphone is more sensitive to changes in sound near the sensor, and slightly less further away, but the
microphone’s capacity decreases slowly through space, compared to other audio sensors used.
Figure 51: Left: thresholded 3D visualization of three transducers aligned in parallel. Using the colour scheme
mentioned in the previous section. The three transducers are mounted on the top side of the delta plotter, emitting
sinusoidal waves of the same frequency. Right: a horizontal cross-section of the model to the left.
3.3 Summary This chapter describes the process of visualizing veillance data in the form of photosensors and audio
sensors. For the photosensors, the photoquantity is compressed and fitted into a colour map. For audio
sensors, the real and imaginary input components are converted to polar coordinates, and the magnitude
maps the brightness of the pixel while the phase maps its color. The chapter concluded by showcasing
various laboratory data, some of which showed interesting properties, such as wave interference.
50
Chapter 4: Veillons and extramission theory This chapter introduces the terms of veillons, vixels, and the extramission model, and how these theories
can be applied to the veillance framework established from previous chapters to compare the extramission
of sensory veillance flux with the propagation of light rays.
4.1 Veillametry and the extramission model This section recalls the concepts of veillance fields, and introduces the idea of veillons and vixels. These
terms are defined within the context of an extramissive optics model, which is also introduced here.
4.1.1 Veillance definition
Veillametrics or veillametry is defined as the study of the ability to sense. [18][60] Veillametry quantifies
the spatial relationship between the information received by the sensor (information sensitivity) and the
location of the stimulus (which can be a light or audio source) as veillance fields. The veillance field is an
intrinsic property of the sensor itself, and is independent of the stimulus and its level of output. For
example, a camera has the same ability to sense information in the direction about its optical axis, even if
it is placed in complete darkness.
4.1.2 Extramission model, veillons, and vixels
Since veillance is defined as an intrinsic property of the sensor, depicted as vector arrays (in the case of a
digital camera) or waves (in the case of a microphone) directed into space, it is logical to model veillance
fields as outward waves or rays emanating from the sensor. This leads to the application of an ancient
extramission theory advocated by Plato about 900 years ago. [61][62] Emission theory or extramission
theory proposed that visual perception is accomplished by beams emitted by the eyes, as illustrated by the
left image of figure 52. Although this theory is now known to be incorrect, and have been replaced with
intromission theory which states visual perception comes from a light source and enters the eyes directly
or by reflection, the extramission theory describes a reasonable way to model sensory fields. In the
extramissive model, one can imagine veillance fields analogously to ray-tracing principles used in
computer graphics, that the direction of light propagation is reversed, but following fundamental optics
principles. The main advantage of using an extramissive model is that it allows to account for the effects
of degeneration and absorption of the ability-to-see as it propagates through space. [63] The right side of
51
figure 52 suggests veillance emitted from a camera, like how ammunition shot from a gun, or paint from a
spray can that travels through space. The photograph is produced using long exposure techniques.
Figure 52: Left: Illustration by Ibn al-Haytham, 11th century, shows the extramission theory of light held by Plato,
where the lights are emitted from the eyes of the observers onto the object, sourced from the web. [18][61][62]
Right: A long-exposure photograph visualizing the veilliance field (field of view only) of a hand-held veillance
camera gun, produced using long exposure photography. [64]
While the qualitative definition of veillance is set, the unit and metrics of a veillance field, or the
ability-to-see through space, is yet to be established. In the context of a camera, for the intromission
theory, the amount of light received by a photographic sensor, can be measured using a type of
elementary particle known as a photon. A photon is a quantum of electromagnetic field such as light. [65]
Moving towards the extramissive model, where the direction of light propagation is reversed, the unit to
measuring veillance could be closer to a ‘darkon’ - the movement of photon in the negative direction.
Analogous to the relationship between electrons and holes, where an electron is a carrier of negative
charge, and a hole is defined to be the absence of electrons, the darkon to a proton can be regarded as the
hole to electron. [66][67][68]
However, the measurement of time reversed photon, or darkon, does not correctly quantify veillance
fields. As established from before, the ability-to-see of a sensor should be independent of stimulus and
their level of outputs. The quantum of sensory ability should contain light propagation properties of a
darkon, but is emitted by the sensor, such as a camera at all times, regardless of the presence or level of
stimulus output. Furthermore, the notion of a darkon violates the cause-and-effect, or causality, of the
52
relationship between values of the sensor’s input and the level of stimulus output. In other words, pointing
a camera towards an unlit object will not cause such object to receive or emit any quantum of light.
With this in mind, the thesis adapts the veillon as the unit to veillance. The definition previously set by
Professor Steve Mann and Ryan Janzen of a veillon is - one sensitivity bearing quantum of a one-time
sample (frame) from one pixel irradiated outwards from a camera and propagated through space while
obeying fundamental optics laws such as reflection, independent of the amount of light present. [18]
As a veillon propagates through space, it expands in size and eventually may be projected onto a
surface(s), at the same time, the power concentration of the veillon within its vixel diminishes. This is
very similar to the effect of traces of paint propelled by a spray can. The vixel is an important product of
veilliometry, and is defined as the spatial surface that contributes to the readings of a pixel of the sensor.
As an example, if a flashlight is considered to be a single veillion emitter, then the area illuminated on the
ground due to the light can be considered as one single vixel corresponding to that veillon and pixel. [18]
Figure 53 summarizes the proton, darkon and the veillon. As discovered in later chapters, the distribution
of veillance power over a vixel surface may not always be uniformly spread.
Figure 53: A summary of a photon, darkon, and leading to veillon, the unit of veillance, or the ability-to-see
through space from Janzen and Mann’s paper. [18] The darkon is defined as the time reversed propagation of proton,
but is dependent on stimulant output levels. Therefore, this thesis adapts the veillon as the measurement of veillance.
Although most of the analysis done in this thesis is based on camera veilliance, the concept can be applied
in a similar fashion to other modes of sensing. For example, in the case of a microphone, an audio
veillance wave is emitted from a single audio sensor and propagates outwards following wave
propagation and diffusion properties throughout space, independent of the output of stimulants such as
speakers. Instead of a veillion for every pixel in a camera. In this case, a single veillon is defined for the
entire microphone per sampling period.
53
4.2 Veillance flux density, degeneracy, and energetic optics comparisons The human eye is arguably one of the most complicated sensors existed, containing nearly 120 million
rods (functions at low light levels, sensitive receptors) and about 7 million cones (functions at high light
levels, and can distinguish colours). [69] Even with the massive amount of photoreceptors in the retina, it
is still finite, and has a visual acuity of 0.013 degrees between cone centers. [70] The visual acuity is
defined by Merriam-Webster as the the relative ability of the visual organ to resolve detail that as a
function of minimum angular separation in minutes of two lines just resolvable as separate. [71] In other
words, when two particles are too closely placed, they are optically mapped to the same photoreceptor,
which may contribute to the value or colour perceived, but is indistinguishable from each other.
The digital camera is analogous to the eyes in many ways, light enters into variable diaphragms and lens
with adjustable focus (biological or mechanical), cones and rods compared to photo sensors such as CCD
(charge-coupled device) or CMOS (complementary metal-oxide semiconductor) sensors, both converting
light energy into electrical signals. [72] However, cameras have a more severe problem with visual acuity,
with a typical lens like Coolpix 5000 7.1mm offers about 75.5 degrees of horizontal field of view, and
when connected to a 1024p camera, offers 0.074 degrees of visual acuity between sensors. [73] Visual
acuity at a fixed distance to the observed subject matter defines the spatial resolution of the image.
Spatial resolution refers to the size of the smallest possible feature that can be detected. When the subject
material (in a pinhole camera model, disregarding effects of focusing) is brought closer to the aperture,
the image sees only a part of the whole picture, but has a greater spatial resolution, as it can discern the
finer details given its finite image resolution. For example, in figure 54, [74] given the same aerial
camera, when the camera is closer to the ground, it has a greater spatial resolution as it can discern
structures like houses and cars (right), but it has only a small coverage of land surface compared to the
image that is taken further from the ground (left). Given the same image resolution (veillance vectors), the
veillance flux density would be much greater in the right image than it has in the left, in terms of
independent discernable veillons per unit square of surface.
In a way, the veillance flux density can be thought as the inverse of spatial resolution, with spatial
resolution measuring the surface area per pixel (vixel area), and flux density measuring the number of
veillons per unit area. Geometrically, the intensity decay is proportional to the inverse square of the
distance between the sensor and the subject.
54
Figure 54: Aerial photographs of an urban area taken from an online source. [74] Left: the image is taken at a
higher altitude relative to the image to the right, having a coarse spatial resolution, large scale features like city
blocks and airports can be identified. Right: using the identical camera at a lower altitude, the image shows only a
portion of the left image, but this image contains finer resolution features like cars, and houses can be identified.
In a video context, the veillance flux rate can be computed, assuming that all pixels maps to independent
vixels, as the product of image resolution and frame rate of the video camera. The veillance flux rate is
measured in veillions per second.
Effective veillance, or in this context, is the independent degree of sensitivity to stimulus information
through space. Veillance can be also affected by degeneracy aspects of optical properties. For example if
condensation or frost occured on the lens, causing severe light diffusion where all pixels return the same
value, then effectively there is only one veillon per frame. In the case where a non-interfering grating
pattern is placed in front of the camera where half of the pixels are always black, then the veillance count
is reduced by half. There are many similarities between veillance flux source and a point light source,
under the extramission model, including reflection, inverse squared degeneracy relationship to distance,
and other optical properties. Figure 55 demonstrates graphically some comparisons between veillance
flux, and light energy field from Janzen and Mann’s earlier work. [18]
However there are also some differences between the extramissive information sensing and intromission
optics models. One is how sensitivity to unique information may be reduced by effects such as diffusion,
while energy is conserved at all times. Such effects further differentiates veillance radiation than simply
the time reversal of optical radiation. Another difference is that veillance flux is discrete and often
directional, and is not necessarily uniform for all pixels (veillance vectors). In addition to physical
55
obstructions, optical degeneracy, optical imperfection such as lens aberration, chromatic aberration,
optical vignetting, pincushion or barrel effects, and other factors, the flux of sensors (especially array of
sensors, such as a digital camera) can be a rich, expressive function of sensor strength through space
rather than just a sight cone.
Figure 55: A side by side comparison of veillance flux (the sensitivity vectors to surface information) to light as a
point source. Left: some properties of veillance flux such as reflection, diffusion, and inverse squared flux density.
Right: similar properties held by a point source of light, with diffusion as a difference. [18]
4.3 Summary This chapter introduces and contrasts the veillance sensory flux and light propagation in space. Where in
this thesis the extramission or emissions model is used to model the sensory flux emitted from the sensor,
similar to that of a spray can. Note that it is only the ability to sense, or sensitivity to light stimulus is
emitted from the camera, and not light itself, to enforce causality. The chapter also introduced the concept
of vixels as the spatial area that contributes the corresponding pixel’s sensor reading.
56
Chapter 5: Veillograms Now that the veillance model for sensors such as the video camera is established and relatively quantified,
these models can be used to generate accurate veillograms. Veillograms are a quantified measurement of
sensory perception emitted by sensors, over physical surfaces in 3D space as if these surfaces were like
photographic film. If an analogy is appropriate, imagine the camera as a continuously operating spray can,
the areas where it is pointed to will have veillance exposure, and therefore is painted. The longer exposure
to veillance results in the surface painted more than the other places. Furthermore, a surface closer to the
spray can will receive a higher power or concentration of paint than the same surface if it was far away.
The resultant aftermath that are left on the surfaces can be considered veillograms.
This chapter describes the set-up procedures, algorithms, and mathematical framework for creating
veillograms from a sensor, such as a video camera. The chapter is organized by sections: camera veillance
field formulation, surface definition, formulation and detection, ray tracing algorithms, and finally
veilliance bucketing, colour mapping and 3D modelling.
5.1 Camera veillance field formulation In previous chapters we have generalized and modelled veillance field for video cameras as uniformly
distributed vectors within the field of view of the camera, as numerous the number of pixels per frame. It
is also established that the strength of veillance is proportional to the inverse square of distance between
the source and the surface. This is visualized by figure 56.
Figure 56: Diagram showing video veillance source represented by vectors propagating radially outwards from the
camera, the vectors are uniformly distributed and are subject to veillance degeneration.
57
For example, consider a 640 by 480 pixel camera, with the field of view of 80 degrees wide and 60
degrees high, then in the ideal case 640 by 480 vectors would travel radially outward from the camera,
with angular separation of 0.125 degrees horizontally and 0.125 degrees vertically between neighbouring
vectors. The veillance vectors are symmetrical around the optical axis with equal veillance power, in the
ideal case. As the camera moves or rotates in space, changing the position or the orientation of the optical
axis, the same transformations would also be applied to the propagated veillance vectors.
5.1.1 Cameras with barrel or pincushion distortions However, not all cameras are ideal, and could be subject to a variety of camera distortions. One of the
common distortions is the barrel or pincushion distortions. This problem is found commonly in simple
webcams and pinhole cameras. An image found online [75] illustrates these distortion effects in figure 57,
the barrel effect is also apparent in the camera used in figure 44.
Figure 57: Images visualizing the effects of barrel distortion (left two images) and pincushion distortion (right two
images). These effects of the pixels appearing closer or further from the center of the image is proportional to the
original pixel’s radial distance from the center of the image. [75]
In barrel distortion, the original pixel will appear closer to the image center the further away it is in the
original, and the distortion can be modelled as:
where r(x,y) is the radial (Euclidian) distance from the image center, before the distortion. It is possible to
obtain the distortion coefficients k1, k2 and k3 using calibration techniques such as using a chessboard. A
regular chess board can be printed and placed in front of the camera to have all its distorted corners’
coordinates recorded. [76] Feeding the above model into a gradient descent algorithm, the mapping
58
between the undistorted coordinates to the distorted coordinates can be obtained. OpenCV libraries [76]
contain functions that does this calibration, given input images containing the said boards. The ideal unit
vectors can be used with the distortion equations to compute distorted veillance vectors to model these
cameras better.
5.1.1 Cameras with vignetting effects Another common camera distortion is vignetting, which is also common in cheaply constructed webcams
with poor optics systems. Figure 58 found online [77] illustrates the vignetting effects, where the
brightness of the pixel if affected by a fourth cosine factor of the angle which the subject pixel is from the
optical axis.
Figure 58: Photograph of a park found online [77] that visualizes the effect of vignetting in cameras. As the angle of
the subject material increases from the optical axis, the strength of the pixel degenerates at a fourth cosine factor.
Similar to the barrel effects, cameras can be calibrated using a whiteboard, and the relationship between
the pixels and the veillance strength envelope can be computed. These coefficients will be stored in the
veillance vector object, and replaces the equivalent magnitude assumption that was placed in the ideal
case. So the resultant power is a product of its radial factor, and its vignetting factor.
5.2 Surface definition, formulation, and detection As veillance propagates in space, the camera and the computer needs to know whether the vectors
intersect with some sort of a surface. Although there are many advanced ways to accomplish surface
detection and tracking, the paper identifies subject surfaces with object tracking marker(s). From now
these surfaces can be referred as surface(s) of interest. For example, if a 3D veillogram model is provided
to a piano designer for user behavioral analysis, then only the surfaces of the piano in test would be
59
surfaces of interest, and not the tabletop. Each surface is then registered by the marker’s unique
identification number, and the relative offset to the center of the surface is recorded, as well the physical
dimensions of that surface, such as width and height in a configuration file.
5.2.1 Marker tracking using ArUco codes Surfaces of interest are tagged and identified using 5 centimeter wide (although reducing to 3 centimeter
wide markers does not affect detection accuracy significantly), squared markers each with unique
identification numbers. The ArUco markers [78] used for this paper has a data definition of 6x6 squares,
as shown in figure 59. The exterior data bits are always black for identification, and the interior 4x4 bits
allows 9 bits of information to be stored, creating 65536 unique identification numbers. Having higher
resolution markers is possible, but will increase their physical width to maintain detection accuracy.
Figure 59: Left: Examples of ArUco markers, each one identifying a potential surface of interest. Right: Aruco
markers mounted on a cardboard box, one marker to register each side.
The ArUco marker library is available from the OpenCV platform, and can be used for real-time detection
of the markers. The camera needs to be calibrated first with a chessboard to obtain a camera matrix,
which accounts for various camera distortions, as well as scaling pixels seen to physical dimensions. The
algorithm provided [78] uses edge and corner detection to estimate the corners of the marker in the image.
Knowing that the corners should resemble an upright unit square, the rotational and translational
transformations are estimated in the gaze estimation process. From the perspective of the camera, if there
is to be an unit square sitting on the location of the camera, with its normal parallel with the optical axis
(y axis), applying these gaze estimates to the unit vertices (-1,0,1), (1,0,1), (1,0,-1) and (-1,0,-1) will result
in the transformed coordinates seen by the camera in 3D space. The data is then projected using
perspective mapping with respect to the camera, allowing information to be overlaid in the detected
60
orientation. Figure 60 shows examples of augmentation of stored or generated digital data onto various
surfaces with markers attached. Note that the objects are centered on these surfaces, and have the same
orientation as the surface normal of the markers.
Figure 60: Far-left: 3D wave (modelled as time variant 3D sinusoids) animation augmented on a marker. Mid-left:
a set of Cartesian axes are overlaid on a marker, with a spiral wrapped around the normal axis. Mid-right: the same
spiral as the left, with animation effects included. Far-right: audio veillance data is downsampled as 3D point
clouds, augmented on a marker.
With the markers implemented, the camera is able to digitally identify surfaces of interest, and in real
time estimate the distance and orientation of the marker with respect to the camera. This is particularly
robust for situations where the camera or the subject surface might move around.
5.2.2 Surface tracking limitations However, there is a main disadvantage of using trackers. It is when the camera only looks at a fraction of
a surface of interest, but the part with the marker is not seen. Figure 61, left, shows one possible case. The
red line shows the field of view of the camera in test.
Figure 61: Left: One of the main issues with marker representing an entire surface is when the camera sees only a
part of the surface and not the marker. Right: One way to help with the issue is to place multiple markers on corners
61
One of the ways to reduce (but not possible to eliminate) the problem is to install additional markers on
the edges and corners of the surface, which can be useful also when the surface is a control panel or poster
where it should not be obstructed by markers. However, the camera may zoom in to a part of the poster
with no marker, or seeing only a part of the marker, then the veillogram is no longer accurate. The future
works section will discuss some alternatives involving edge detection and tracking, with possibly
accelerometers and gyrometers. This is an advanced topic outside the scope of this thesis.
5.3 3D geometry and ray tracing techniques At this point, the veillance vectors and the surface normal (and corner points) of all surfaces are modelled
in 3D space, all with reference to the camera as coordinate origin. The coordinates are relative to the
camera, so if the camera or the objects were to move, the system still functions correctly.
During any frame, where at least one marker is detected, the unit vectors(u) computed from section 5.1
extended into a ray, r, parameterized by t, is described as r(t) = t(uy,uy,uy).
The surface normal of the detected surface can be computed using the first three corners in 3D space.
Forming two unit vectors extending away from the middle point towards the other two, a cross product
can be computed to find the normal, n. In the form: n (x,y,z) + D = 0, where D is a constant. Substituting
any point of the surface into the equation to compute the value of constant D. The equation describes the
plane which the marker lies on in 3D space.
Substituting r(t) into the the plane equation produces a valid point of intersection of the ray and the plane
derived from the previous step, given the parametric solution, t, exists and is positive. To verify if the
point lies inside the surface of interest, the point of intersection is then transformed back with respect to
the camera, by multiplying the inverse transformation matrix. If the point lies within (-xr,0,zr), (xr,0,zr),
(xr,0,-zr) and (-xr,0,-zr) then the candidate is an interior point, where xr or zr is computed as the ratio of the
surface width, or the surface height, respectively, over the marker side width. As for the interior point, the
distance is computed using the positive parametric solution, t, and with known distance scale. The
veillance power at that point of intersection is then added by the original magnitude of the veillance
vector multiplied by the inverse square of the distance, when the veillance is to be modelled by a wave
rather than vectors. When using finite number of vectors propagating radially, the power degeneracy is
already represented by the geometry of the rays - the further the intersection is, the more spread-out the
intersections are, with the same proportions as the inverse square model.
62
5.4 Veillance bucketing, colour mapping and 3D modelling Eventually, a veillogram is produced in a form of a 3D model, so the user can examine the veillance
exposure of sensors from various perspective and scale settings. Rendering a veillance model requires two
components: the physical descriptions of the location, size and orientation of all the surfaces, presented as
global coordinates; and the veillance exposure distribution over these surfaces. While the coordinates are
measured and stored by users, the textures that gets painted over the surfaces of these models needs to be
generated by analyzing the camera and the surfaces it looked at over a time period.
From the previous step, a list of points of intersections (POIs) is verified as interior points and recorded in
a list. Each POI contains the following information: exact location of intersection on the surface, the
identification number of that surface, and the value of veillance power at the point of intersection. For
every surface that has an marker registered to, a matrix is allocated to store the veillance data. If a surface
have multiple markers registered, then only one of the marker’s POIs and its data is to be processed.
Since the veillance data is to be displayed on a monitor with finite resolution as multiple texture images,
the veillance data needs to be stored in buckets. Typically each veillance accumulator array is a few
hundred pixels wide, depending on the aspect ratio and the relative size of the surfaces they represent.
This determines the resolution of these texture images. The location of intersection is then rounded into
their corresponding bucket on the particular array, and the contributing vector’s veillance strength is
added to that bucket. This is done for all the verified POIs across all the frames of the video to be
analyzed, as veillogram is a time integral of sensory perception. A finite distribution of vectors creates
problematic gaps between POIs, when the camera is looking at the surface at a further distance. This
problem and solution will be discussed in chapter 7.
After all the veillance data is accumulated, all the elements from every surface is scanned for the
maximum veillance strength, which then the rest of the data is normalized to produce list of values
ranging from 0 to 1. The veillance quantities are then mapped by a compressor function, similar to the
ones examined from earlier chapters. The parameters for the compressor function is slightly more
aggressive to make locations with little veillance exposure more visible, and smooth out the highly
concentrated regions. Afterwards the matrices are colour mapping according to the COLORMAP_PINK
schema from the openCV libraries, the colour scheme is presented as figure 62, to produce texture maps
shown in figure 63. Finally, a 3D modelling program such as OpenGL can load the textures to generate a
veillogram, shown as figure 64.
63
Figure 62: The colour mapping schema used for visualizing veillance quantities into a texture image.
Figure 63: Some of the texture maps produced from a camera recording is shown. The surface of interest are the
bottom and side of a cardboard box for a brief duration.
Figure 64: A OpenGL veillogram renders of the cardboard box shown in figure 59 with produced texture maps. All
the surfaces are labelled with their corresponding ArUco identification number. The render assumes perfect optics
conditions, and the camera is free of distortions.
5.5 Summary This chapter have examined in detail the procedures for constructing a veillogram: the vectorization and
specification of veillance field, surface modelling, ray tracing and find intercepts, and data bucketing,
colour maps, texture maps, and 3D modelling.
64
Chapter 6: Bioveillograms This chapter explores the veillance fields of one of nature’s most complex sensors - the human eyes. This
chapter starts with an introductory section explaining bioveillance, and its importance in showing the
extent of human visual attention; and its multitude of applications. The chapter then explores how to
measure and model bioveillance, so it can be used to expand on the existing veillance framework. The
chapter then explains the steps required to implement a simple light-weight eye tracking device that can
be used to with the bioveillance data to create 3D eye veillograms. The chapter concludes with
suggestions of improved setup using the eyetap principle to increase system accuracy.
6.1 Bioveillance - human eyes as veillance-rich sensors The following subsection is a paraphrased summary from the IEEE GEM (Games, Entertainment, and
Media) paper: ‘“Painting with eyes”, Sensory perception flux time-integrated on the physical world’,
written by Sen Yang, Ryan Janzen, and Steve Mann. [43] The chapter explores the human eyes as a
veillance rich sensor, rather than simple gaze point.
Eye tracking research has been a growing field that have been increasingly active, and demands for these
systems rises both for consumers and commercial sectors. User interface design, aircraft pilot research,
advertising and marketing studies, and immersive gaming are just a few examples. However, a lot of the
time these application heavily relied on the simple direction of the eye gaze to reach a conclusion about
where the user is looking and their attention focus, such as the example shown with the website traffic
study in figure from the introductory chapter. Unfortunately gaze direction on its own reveals limited
information of the sensory detail perceived - there is a lot more detail that can be perceived by humans
outside of its field of view. In fact the visual attention can be subconsciously redirected to limited parts of
the field of view, and even in peripheral vision, especially as an reaction or response to sudden moving
objects inside the field of view. This is done as neural processes that has little effect to changing the gaze
angle, to some degree. [79][80][81]
The actual veillance field of the eyes is actually a complex field and should be measured as a combination
of a variety of veillance vectors, with differing strengths. Simply presenting gaze angles as circles or dots
ignores the rich expressive veillance field of the eye within its field of view. The following section aims
to model the ability to sense of the eyes using non-intrusive eye tests.
65
6.2 Eye tests and bioveillance modelling This section explores some hypothesis on the veillance distribution of the human eye, assuming the two
eyes works independently - as data gathered is only based on one eye. The section continues exploring
some of the previous eye tests conducted by Ryan Janzen [82], and also introduces new eye tests using
absement concepts. [83] Finally the sets of data is processed and analyzed to produce weighted veillance
vectors that suggests the bioveillance of one eye.
6.2.1 Model hypothesis based on human anatomy As stated in the introductory chapter, the human eyes consist of two types of photoreceptors: rods and
cones. Most of the sensors are rods, accounting for 120 million receptors, which functions at low levels of
light, and is very sensitive to changes in the amount of light, but is not very sensitive to interpret colour.
On the other hand, there are merely 7 million cones, which function only at high light levels, and can
distinguish colour. [70][84] Figure 65 illustrates the distribution of cones and rods as a function of
eccentricity in degrees. [84]
Figure 65: The distribution of rods and cones in the eye, as a function of eccentricity in degrees, where the optical
axis is at 0 degrees. [84] There is a high distribution of cones towards the center of the optical axis, allowing more
colour to be seen, and more rods towards the peripherals where the eye is more sensitive to light and motion.
66
The figure reveals that rods, or the receptors most sensitive to levels and changes in photoquantity,
diminishes in number approaching the optical axis, to a point where there are no rods at all at the center.
There is a known technique amongst stargazers called averted vision. [85] When observing faint starts
under the dark skies (low light level condition), sometimes the star appears invisible when it is directly
gazed at, but becomes visible when is it looked at using peripheral vision, where there are more rods
operating in the low lit environment. At low light levels, the center of the eye is a photosensitivity blind
spot, as the eccentricity increase in magnitude, the light sensitivity increases when color sensitivity
decrease. In lowly lit conditions, the eyes may have greater veillance in its peripheral regions than the
region seen directly in front of its optical axis.
To the exception of lowly lit scenarios, and for the majority of the situation for veillance applications,
there will be sufficient levels of light where both cones and rods become operational, essentially both
acting as receptors of photoquantities. With this in mind, the total number (caused by higher density) of
photoreceptors is most at the center of the eye, and gradually diminish towards the peripheral. Although
not as much, there are still significant levels of photoquantity and change in photoquantity (motion)
sensed by the eye, in an almost 180 arc in front of the eye. Therefore, the veillance strength of the eye
should not be described by one simple gaze angle, but rather a veillance flux (or collections of vectors
detailed enough to represent such flux). Similar to the illustration in figure 66, it is therefore hypothesized
that veillance (density) is strongest at the axis, and diminishes as eccentricity increase.
Figure 66: Left: The photograph of a participant with a speculated veillance flux overlaid as color coded shapes
propagation radially outwards from the left eye. [86] The data is obtained from an eye test, and red shapes indicate a
high level of photosensitivity, and blue indicates a relatively lower level of photosensitivity. Right: Higher density
of photoreceptors indicates a stronger definition and spatial resolution and smaller vixels near the optical axis. The
vixels increases in size as eccentricity increase. [70][84]
67
6.2.2 Previous experiment setup There have been some previous works done by Steve Mann and Ryan Janzen on eye tests of varying
stimulus photoquantities in well-lit environments. [86] As shown in figure 67, a participant is asked to be
seated near the center of a stimulus display, such as a monitor, at a fixed distance away from the monitor.
The participant is required to stay as still as possible, and also with as little eye and/or head movement as
possible. During the test, at the center of the screen, a letter would appear in a crosshair and will be
present only for a limited duration. At the same time, somewhere else on the screen a stimulus is shown
on the screen, which is classified either as a strong or a weak stimulus. The strong stimulus may
differentiate from the weak one by attributes such as brightness (opacity value), number of stimulus
placed in close proximity, size and types of shapes. The user will need to input a capital version of the
center letter if a strong stimulant is observed, otherwise a lower-case letter is required. For example, the
correct response for figure 67, right, is an uppercase T, because the letter is t, and a strong stimulus is
shown. An incorrect entry would invalidate the attempt for a later time. The test requires the user to react
quickly and respond honestly. The amount of time which it takes the user to generate the correct output is
recorded, and the process is repeated for multiple times, with each time covering every possible point.
The recorded data is then generalized with underfitting parameters to generate a very smooth model of the
veillance distribution, shown as figure 68.
Figure 67: Left: The experimental setup for the original eye test program. The user is positioned such that their
optical axis directly aligns to the normal of the screen. Right: The screen would then display a weak or strong
stimulus at some random location while a letter is generated in the center.
68
Figure 68: The veillance flux emitted by test subject BW, sourced from Janzen’s earlier work on bioveillametry.
[86] Note that the data presented has log scale, radial decay (consistent with the veilliance degeneracy property), and
under fitted 3rd order polynomial regression for smoothing.
The test is well designed to account for the varying degree of veillance field using a wide variety of
stimulus, and attempts to quantify it by the response time. However, it would be difficult to quantify the
effects of these changing stimulus attributes and even harder to correlate it to how these changes affect
response time. Furthermore, it is possible for the user to cheat the test in a few ways: it is possible for the
user to guess the input (although there are some penalties for doing so), and it is also possible for the user
to move their gaze away from the center, provided the action is carried out quickly enough, a correct
response is possible. Lastly, there may be some inconsistency in the response time, and varying in input
time, making the data extremely inaccurate unless a very large amount of data is available.
6.2.3 New experiment with application of absement To improve the accuracy of the eye test experiment mentioned above, the metrics of the veillance strength
is reconsidered. Also the honesty factors are reconsidered so that instead of hoping the participants won’t
cheat, a system should quantize and penalize appropriately the amount of cheating, in other words,
cumulative gaze change.
The experiment is redesigned as follows: a participant is provided the instruction to sit still in front of a
stimulus screen (monitor) similar to the previous setup. The participant is provided a tripod, which is
69
adjusted to the correct height so the optical axis of the eye under test is aligned with the center of the
display, and acts as a headrest to prevent head movement. To totally ensure that the user’s head is not
moving during the experiment, a digital gyroscope can also be added to detect head movement, and
penalize the participant accordingly for the corresponding data point. The idea is to have the participant’s
head as still as possible. Furthermore, the participant is given an eye tracker to wear to monitor how much
their eye moves during the experiment. For details on the eye tracker please consult the next section.
Figure 69: The initial instruction given to eye test participants. The participant have their head rested on a head rest
while wearing an eye tracker when participating in this experiment.
Figure 70: Left: the first stage of test for a new data point - a blue number appears in the center of the screen inside
the circle, which the participant needs to enter. Right: after the center number is entered, a white number appears
randomly somewhere on the screen. This process is repeated for every possible point at a random order.
70
The user is then provided with the following instructions, as provided in figure 69, telling them to keep
their gaze fixed to the center of the screen as much as possible. A random number (shown in figure 70,
left, as a faint blue ‘1’) will appear in a permanently fixed blue circle in the center of the screen, and then
as soon as the correct key is pressed the number disappears, then another number appears elsewhere on a
random location (shown as ‘2’ on figure 70, right). The user will have to enter the correct number for it to
disappear. An incorrect entry will penalize the participant severely for that data point. The user may press
‘space’ if they are clueless about the location of the stimulant, which will apply some penalty to that
location, but not as much as an incorrect input. There are 20 coordinates horizontally and 10 vertically
where the number could possible appear, including the center pixel itself as a white number instead of
blue. The experiment is conducted for 5 times, with all 200 coordinates done each time at a random order.
The randomness of the sequence is a function of current time, so the experiment is different every time
the program is run.
Instead of quantifying information sensitivity by the response time, this paper proposes the amount of
integrated values of eye motion to be the new metrics. The claim is that when the stimulus is right in front
of the optical axis, the contents of the stimulus is in plain sight and can be easily extracted, since the
center of the eye is established to have the most photoreceptors. As the stimulus gets further away inside
the field of view, sensory capacity degenerates to a distance where the symbol is still visible but as a blur
blob. The user may also have a good sense of the rough location of the stimulus when it pops up, recepted
by motion. Since the participant is required to enter the number, they may move their eye ever so slightly
to see the somewhat the far number. Lastly, for the really far numbers, not even a blob is visible. The
participant will have to gaze around to locate the number, leaving the most accumulated movement on the
eye tracker.
Absement [83], or the time integral of displacement, of the eyeball is recorded for every data point. When
the user enters the center number correctly, the total absement is set to 0, and the location of the eye at
that time is set as the new reference, and on a separate process the distance between the current eyeball
location and the previous recorded location is added to the penalty value, with the new position updated
as the previous. This addition happens about 30-40 times a second, and halts when the user enters the
prediction for the white number. And then the process would repeat until the program terminates.
Lastly, a matrix is allocated during execution with each element representing the eye absement recorded
at the location accessible using row and column numbers as indices, plus any applicable additional
penalty. The matrix is then normalized between the global maximum and minimum, and then colour
71
mapped, using the same schema presented in chapter 5, using an expansion function. The result is an
image representing the veillance strength of the right eye of participant S.Y.. When looking at these
particular points with a fixed gaze, presented as figure 71. Note that this image is 20 pixels wide by 10
pixels tall, representing the resolution of the test. The data points for this image is obtained from a
participant sitting approximately 6 centimeters away from a screen, with the program window centered,
15.5cm by 8.0cm.
Figure 71: The 20 by 10 pixels image visualizing the results of total eye gaze displacement participant S.Y.’s right
eye when looking at various stimulus at controlled locations, from initially gazing at the center of the screen. The
absement values are normalized as a range, expanded and colour mapped to produce this image.
Note that this is a colour map of eye absement, or total displacement of the eye, and not one of veillance,
or the sensing ability of the eye. The dark regions of the figure indicates regions where the stimulus is
very visible, and can be distinguished without significant amount of eye movement. The pinkish regions
indicates a region of poor sensory density, where the participant can sense the location of the stimulus
within their peripheral vision, but not able to distinguish the letters themselves without changing their
gaze towards the stimulus. White regions indicate areas possibly outside of active peripheral vision,
where either the participant have given up on looking by pressing SPACE, or looked around everywhere
searching for the stimulus. After the test is concluded, points labelled as given up will have the same
score as the maximum absement. When the user produces an incorrect response, the score is doubled that
of the maximum. This is an effort to encourage honesty in the experiment. The inverse of total eye
absement is used in this thesis as the metrics to quantify eye veillance, or the sensory density and sensing
capacity as a function of varying angles.
72
6.3 Eye tracker implementation In order to collect eye absement data, a system that tracks the cumulative amount of eye displacement is
needed. Furthermore, as will be discussed in section 6.4 - creating veillograms for the human eye, an eye
tracking device is required to estimate the gaze direction of the user to paint veillance in that general
direction. Since creating a high accuracy eye tracker is not the main focus of this thesis, the section will
only discuss the implementation of a light-weight tracker at a high level scope, and is documented for
readers that are interested.
6.3.1 Eye tracker basics There are multiple methods being actively researched on to achieve eye tracking - visible light camera
tracking, [87, 88] EEG (Electroencephalography) or EMG (Electromyography) tracking, [89] and pupil
center corneal reflection eye tracking, [90] to name a few. This paper implements an eye tracker using the
pupil center corneal reflection technique. This technique allows key features of the eye, such as the e
pupil, to be contrasted from the iris for reliable and consistent computer vision detection of the pupil,
labelled in figure 72.
Figure 72: [91, online] Left: A photograph showing the effects of bright pupil illumination, with the pupil
extremely bright as it retroflects the incoming infrared shone into the optical axis. Right: A photograph showing the
effects of dark pupil illumination, with the iris internally reflecting the incoming infrared light, leaving a stronger
contrast between the pupil and the iris.
There are two classifications of the pupil center corneal reflection method, dark pupil tracking and light
pupil tracking. Both methods involves in exposing the eye to some level of infrared (IR) radiation. [91,
92] In the bright pupil method, the infrared light and the camera is shone into the optical axis. Due to the
effects of the tapetum lucidum, [93] a retroreflective tissue located behind the retina, the infrared is
reflected back directly in an antiparallel path. [94] This creates an effect similar to the “red-eye” effect in
73
photography, where animals or people have a bright white illuminated pupil, similar to figure 73, bottom.
However, if the infrared light is shown at an angle with respect to the optical axis of the eye, with the
camera facing the axis, the entire iris would appear illuminated by the infrared light with the pupil
unaffected, creating a large contrast of the two bodies, known as the dark pupil illumination method. A
visualization is shown as figure 73, top. This paper uses a dark pupil eye tracker, due to its greater ability
to contrast the pupil from the rest of the image, making tracking reliable with relative ease of
implementation.
Notice that in both illumination methods, there is also a strong dot visible on the eye, as a reflection of the
infrared source. Due to the largely spherical nature of the eyeball, the reflection (referred as infrared
brightspot, or simply brightspot in this paper) is stationary and independent of the gaze direction of the
eye, given that the gaze angle of the eye is not extremely away from the optical axis. Using computer
vision methods discussed later in this chapter, the center of the pupil and the stationary brightspot can be
detected, and their relative displacement can be used to compute a rough estimate of the gaze direction of
the eye.
Figure 73: A diagram that illustrates the difference between dark pupil and light pupil illumination methods. [90]
These are the two classifications of pupil center corneal reflection eye tracking. The main difference between the
two methods is the angle in which the directional infrared light is exposed with respect to the optical axis of the eye.
74
6.3.2 Eye tracker hardware implementation This subsection describes the hardware implementation of a dark pupil illumination eye tracking device,
referred as eye-tracker in this paper. The design is adapted from “building a lightweight headgear” by
Babcock and Pelz [95] with modifications in choices of camera and controller components, addition of
infrared filters for the eye tracking camera, changes in the data storage system, and the overall head
mounting glass design.
As illustrated in figure 74, the implemented eye tracking system consists of three main electronics
components - the infrared (IR) current regulation unit connected to an IR LED unit, an eye tracking
camera and a scene camera, which is not used directly for eye tracking. The eye tracking camera looks
into the eyes, with the IR directed at a small distance away from the camera to create the dark pupil effect.
Figure 74: A labelled photograph the implemented simple, lightweight and function eye tracking prototype. The
prototype consists of three main electronics components - the infrared (IR) current regulation unit connected to an
IR LED unit, an eye tracking camera and a scene camera, which is not used directly for eye tracking.
75
The amount of infrared radiation determines the effect of the dark pupil illumination. The stronger the
radiation, the more contrast is found between the iris and the pupil in experiments tested using the
implemented tracker. However, an excessive amount of near-infrared radiation of wavelengths between
700 nm to 1400 nm into the eyes can cause severe irreversible damage to the cornea and retina, depending
on the strength of the exposure. [96] Experiments with visible wavelength LED have proved unsuccessful
due to its tendency to be unreliable due to environmental lighting, as well as uncomfortableness caused
for participants. A 5mm 940nm wavelength IR LED is connected to a 12 volts DC supply, and the current
directed into the IR LED regulated by the circuit, from Babcock and Pelz’s paper. [95] The current
directed into the LED produces an adjustable irradiance output level, which is no more than 10 mW/cm2.
This output level is considered save for this infrared radiation range. [95, 96]
The eye tracking camera is extracted from a cheap webcam rather than the IR range camera described on
the original paper, with the infrared filter, which is a thin red piece of glass in front of the sensors,
carefully removed. Furthermore, an infrared range bandpass filter is carefully attached in front of the
camera to make the camera much more sensitive to IR radiation and more robust to environmental
lighting effects. The system is very robust for a variety level of lighting conditions, except for when
additional IR radiation is introduced, such as taking the system outdoors. The cameras are designed to
connect directly to a laptop for real time processing. The components are carefully and firmly attached to
a head mount (modified eyeglass frame), with the LED and eye camera directed to the participant’s eyes.
6.3.3 Eye tracker software implementation With the participant wearing the eye tracking device with LED illuminating sufficient levels of IR output,
the modified web-camera will continuously feed images of the eye under a dark pupil illumination
condition into a connected processing device, such as a computer using USB communication. A computer
vision program is written with the aim to track the location of the eye, and also estimate the user’s gaze
direction, for future uses described in section 6.4.
From the raw frame from the camera (such as the one in figure 75, left), an inverse pixel value threshold
is applied to filter out bright details, and suppress regions with insufficient IR illumination, as shown in
figure 75, middle. The thresholded binary image allows easier edge detection. Afterwards, a gaussian blur
followed by a Sobel-Feldman edge detection is applied to obtain a list of non-connected contours, shown
as figure 75, right. Each set of connected pixels is considered a contour, and are considered a possible
pupil contour candidate.
76
Figure 75: Left: Raw images from the eye tracking camera, under the dark pupil illumination effect. Middle: A
thresholded and inverted image of the raw frame. Right: The thresholded image allows the image to be edge
detected for a list of contours, these contours are used for pupil detection.
Figure 76: A program generated prediction of the pupil’s location using the method described. The center of the
pupil (shown as white dot) is the center of the eclipse that best fits the contour of the pupil candidate. A white
rectangular bounding box is shown also. The bigger black dot shows the location of the brightspot reflection.
A set of rules are applied to validate contours as an pupil candidate, or else be eliminated from the list of
candidates. The contour must satisfy certain requirements such as that the contour must be within a
particular area of the image (the region inside the eye socket); the contour area must be the same as a
ellipse within a small error margin based on its parameter; the bounding boxes of the contour must also be
within a particular aspect ratio, to eliminate long eclipse shapes that aren’t the pupil. Furthermore the IR
brightspot location is also tracked, and if it is within the pupil contour, the surface area check is modified
77
to add the area difference. With all the candidates, the algorithm will take the one that is closest to the
previous pupil location within some euclidean distance error margin. The error margin starts at a very
high amount as there is no prior knowledge on the location of the pupil, but as the location is known and
closeby, the certainty increases and the error margin decreases. When the pupil becomes undetectable
resulting a skipped frame, the margin will be relaxed until the location is certain again. This method
allows reliable selection of the candidate based on known location, and made robust to situations such as
sudden eye location change, or blinking while changing pupil location. The final contour will have all its
points (110-160 pixels) fitted into an ellipse that minimize mean square root error. The center of that
ellipse becomes the pupil center location. A similar process is used for the detection of the brightspot,
using a greater threshold value and a smaller area check that is consistent with historic recorded size of
the brightspot. Figure 76 shows an augmented photograph of the eye. The pupil center is shown as the
white dot, which is also the center of the eclipse that best fits the pupil candidate contour. The black
bigger dot shows the location of the brightspot.
For a 640x480 pixel camera used for this tracker, this method is does not use heavy loop computations
such as circular Hough transforms (5 degrees of freedom for an eclipse model) that uses lots of memory
and cycles. The algorithm is lightweight and produces a consistent 40-47 frames a second on a Lenovo
IdeaPad U410, at an average, saturated detection rate of >97% (once the program has been run for more
than a few minutes). The detection rate is measured under 8 different poorly or normally lit indoor scenes
such as a computer lab, staircases, hallway or storage rooms. However, the system performs less
accurately in well exposed areas such as outdoors or underneath a well lit window.
6.3.4 Eye tracker calibration The tracker can now accurately estimate the location of the pupil, as required by the previous section.
However, the participant is required to calibrate their eye tracking device before it can be used to estimate
their gaze angle. This is because different people have different nose and eye structures, eye sizes, and
feature locations. Since the tracker is mounted on the head, and the relative distance and orientation
between the eye and the eye camera is constant, one calibration when wearing the device is often
sufficient, unless the user accidentally changes the orientation of the tracker.
For the calibration, the participant is to sit on a head rest (tripod) 40 centimeters away from the monitor,
with their chin firmly stationed on the rest. The user is required to be staring at 8 points around the edge
and the corners of the screen in a clockwise order, until 200 pupil position samples are obtained for each
of the 8 points. The calibration screen is shown as figure 77.
78
Figure 77: Screenshots of the eye tracker calibration user interface. The user will have to calibrate gaze angles for 8
of the dots on the edge and corners of the screen. The ArUco marker enforces the user to keep the camera (and
hence the eye) parallel to the screen and at the correct location and distance by a small safety margin. If the actual
mapping of the marker seen by the camera (light blue) is incorrect by some margin the program halts until the issue
is fixed. An augmented eye image is provided on the bottom of the screen for debugging purposes.
An integrity mechanism is placed to enforce that the participant does not move their head during the
calibration process, and to ensure the camera is fairly parallel to the screen, and close to the correct
distance. An ArUco marker placed in the middle of the screen, in the case the head mounted camera does
not see the marker, or if the marker is displaced too much due to camera misalignment (as shown to the
left of figure 77), the calibration halts and the marker turns red until the position is corrected again (as
shown to the right of figure 77). Every time the process is interrupted, the user will have to enter a
specific key to resume the process. The dark blue square indicates the desired ArUco mapped position
seen by the camera, while the light blue quadrilateral transformed shape indicates the actual position and
orientation. The active dot (top middle) is brightened as the spot the user should be looking at for
calibration, with a green progress circle surrounding it. A debugging frame is shown on the bottom for
rare cases (mostly due to unwanted lighting) when sometimes the tracker doesn’t detect anything.
When the calibration process is completed, there are 1600 sample points recorded, 200 for each of the 8
points. The points are then averaged into 8 mass centers. One quick render of the calibration is shown on
figure 78. The red points are the corner points, which will be used for calibration, while the white dots are
only as reference, to indicate the integrity of the data. Finally the 4 red points are the corner inputs into a
79
homographic transformation matrix function to a square with the coordinates (-1, 1), (1, 1), (1, -1) and (-1,
-1). And lastly, the known dimensions of the expected eye-screen distance, and the UI’s physical
dimensions on the laptop screen is used to scale the gaze angle of the eye. The homographic
transformation is used to approximate angles, directly using the location of the pupil center. However, this
model only holds when the user’s gaze is moving around their optical axis, using the assumption: sin(𝜽) ≅
𝜽, when 𝜽 ≅ 0, and might not hold elsewhere. The validity of this assumption is examined later in this
chapter, using a larger calibration set with data modelling techniques.
Figure 78: A visualization of the calibration screen’s 8 mass centers, overlaid as white (edge points) and red (corner
points) on a raw image. Each point shown is the average of 50 sample points. The corners of the calibration is used
(when all 8 points visually resembles a square) to establish the scale between pixels seen by the eye camera and
physical measurements in centimeters. The homographic transformation may be used to estimate tiny gaze angles
when not looking far away from the optical axis.
6.3.5 Gaze estimation To understand the relationship between the gaze angle and the physical location of the pupil, and to verify
if the simple, linear hypothesis in the previous section holds, a large scale calibration is conducted.
Similar to the simple calibration process, the participant sits at a fixed distance at 0.9 meters in front of a
large calibration board. The participant’s head lies on a tripod as a headrest, and is instructed to not to
move their head during the experiment. The calibration board, shown in figure 79, is made by stretching a
patterned tablecloth over a piece of cardboard. The board has 11 by 15 white squares, with the white tiles
separated horizontally and vertically by a distance of 5.08 cm. The participant will begin at gazing at the
top left corner of the board, and keep the gaze until 100 pupil coordinates are recorded. When the point is
completely sampled, an auditory signal is generated to cue the participant to move on to the next target.
This process repeats until all the targets are completed. The participant may, at any time, press a button to
pause or resume the calibration process, as the calibration process is lengthy and may cause
uncomfortable eye strain, especially when gazing at the edges and corners of the board.
80
Figure 79: The calibration board used for a large scale calibration. This is used as an attempt to establish a model
between the pupil position and gaze angle of the eye.
The coordinates of the pupil for each square is recorded and then plotted as an overlay on the last frame of
the eye during the experiment shown in figure 80, top. Each white dot on the photographs represents one
data point. The bottom of the figure shows 6 smaller images, with all the data dimmed, except the row or
column which are to be highlighted. Looking at the data in figure 80, the data suggests an accurate,
consistent, monotonic and one-to-one relationship between the gaze angle and the pupil location,
vertically and horizontally. There is some gap at the intersection of row 8 and column 9, which is caused
largely by brightspot interference. Other than that, the data largely suggests a spherical (ellipsoidal) model
judging by the shape of points. Intuitively, the shape of the eye is roughly a ball, and without access to a
detailed model of the eye, a spherical estimate may be sufficient. Although a spherical model is sufficient,
fitting these points into an ellipsoidal model might be troublesome, considering the points seen are a
projection of the model itself as some unknown domain and perspective. But this data looks similar to
something already encountered in section 5.1, when studying barrel effects and camera distortions. The
81
points here can be considered to be the chessboard corners seen by a very distorted camera with heavy
barrel effect. With this in mind, the model can be fitted with equations duplicated from before:
Figure 80: Top: The data points plotted as white dots representing pupil center’s coordinates on the picture of the
eye (last frame). Bottom: The same images with the same data points, but all dimmed except the row or column of
the data that corresponds to the numbers shown on the top left corner.
82
Using gradient descent on all the data points, or using openCV’s calibrateCamera [76] function with
averages of the data points per square as inputs, a relationship between the pupil center pixel location on
the image and the gaze position on the calibration board at 0.9m away can be established, using an inverse
of the models above. Practically, the equation is solved iteratively using numerical methods. The gaze
location on the board can be used to compute the gaze angles using simple geometry, since the
measurements of number of rows and column away from the center square is known (5.08cm) and the
distance between the eye and the board is known (0.9m). This produces two rotation angles for any
arbitrary input inside the field of view of the participant, with with respect to the vertical axis (up axis in
computer graphics terminology) of the eye, and one with respect to the optical axis of the eye. These two
angles are sufficient parameters to describe gaze angle.
For better estimation accuracy participants are encouraged to complete the large calibration set, although
it is quite lengthy a process, and participants have often complained of eye strain, especially when looking
at further squares. The simpler calibration with 8 points provides a fair estimate on the gaze direction
when the user is looking at small angles. Note that for the spherical model, or the barrel effect model
used, distortion with respect to a linear grid (error of using an linear model) increases radially outwards.
Furthermore, participants would unconsciously move their heads to look at further squares to avoid eye
strain. In a practical application where users are allowed to move, extreme angles of eye gazes are
typically avoided by the participants, as doing so would make their eyes strained and uncomfortable.
6.4 Creating veillograms from the human eye With the eye tracking system implemented, the veillograms of the human eye can be created in a similar
manner for that of a camera, discussed in chapter 5. The bio-veillogram process has the same three
components as a regular veillogram with a video camera, with slight modifications. First, the veillance
vectors of the eye is no longer uniform, as the distribution of photoreceptor and their density is
non-uniform. Depending on the resolution of the bioveillance model created, the model is simply a down
sampled, vectorized model of the bio-veillance, which is far more detailed. Similar to the camera, the
veillance power degenerates at an inverse square relationship with respect to the radial distance. The 8 by
9 bioveillance data from 6.1 is stored as vector magnitudes. As for the second part of veillance
components, a scene camera is added to the eye tracking system described earlier, shown in figure 81.
The detection and pose estimation of ArUco markers operates the same as the case with the video camera.
83
Figure 81: The front view of the bioveillance system prototype. The prototype is a combination of a dark pupil
effect tracker and a scene camera, the camera is an outward facing, wideview camera that aims to detect markers
placed in the general direction of the eye.
Finally, for the component with ray tracing and finding the locations of ray-plane intersection, there are
some major differences between using the eye and the camera as sensors. In the camera veillance case,
veillance vectors centered on the optical axis would be emanating radially outwards onto surfaces of
interest. Where in the eye veillance case, the bioveillance is no longer centered on the optical axis of the
camera, but the optical axis of the eye. The eye’s optical axis (assuming the eye has zero gaze angle with
respect to the eye tracking camera) is opposite with respect to that of the eye camera. A perspective
transformation applied to the eye camera’s veillance vectors to translate it into the scene camera’s
coordinates, as all markers detected are with respect to the scene camera. Furthermore, the participant will
change gaze during the time that they interact with the surfaces of interest. In that case, a rotational matrix
needs be applied to the eye camera vector before the transformations are carried out, using the two gaze
estimate angles. The rotational vector (r) can be converted to a rotational matrix (R) using OpenCV’s
Rodrigues function [97]:
In other words, the bioveillance vectors are centered off the gaze direction of the right eye of the
participant, from the reference of the scene camera. To transform the eye camera’s (labelled as world in
the equation) perspective to the scene camera’s perspective (labelled as camera), the following
84
multiplication is performed. [98] U, V, N are the normalized orthogonal components of the camera’s
coordinate systems, and Ux, for example, is the U axis of the camera projected to the x axis of the world
coordinate (in this case the eye camera’s axis).
The parametric functions are carried out in the same manner as that of the camera veillance, computing
points of intersection to planes of interest, and verifying if the point is indeed an interior point of the
surface. The distance is computed and the veillance power adjusted. The data is then accumulated across
all the frames where the object is interacted with. Finally the matrix is normalized and colour mapped
using a provided colour scheme. The texture maps are generated, and the 3D models rendered using the
maps to create a 3D bioveillance model.
Figure 82 bottom right shows the setup for one of the example of bioveillance modelling. An audio mixer
is set up with two surfaces of interests. One in the same plane as the knobs and controls on the main
panel, and another on the same plane as the connector cables, with some premeasured offset from the
center of these surfaces. The ratio of the surface to the marker is also measured and recorded in a
configuration file. The corners of both of the surfaces are recorded as 3D coordinates as an input file for
the rendering program (openGL). The participant have conducted a simple calibration using the eye
tracking device and is asked to look around the audio mixer visually for a short period about 5 seconds.
The top picture of figure 82 shows the computed and rendered bio-veillogram of participant SY gazing
over the equalization knobs on the top left corner of the panel. The left bottom shows the render of
another run where the participant is looking at the label above the sliders on the mixer. Notice the
veillance flux is only computed for the right eye of the participant, even the render suggests two regions
of concentrated veillance flux. Until a dual tracker can be implemented, it can be assumed that the
participant has no veillance emitted from the left eye, as if the person has the eye covered with a patch.
A 3D veillogram provides more insight on the visual sensory field of a participant exposed over 3D
objects in space. Rather than a heat map of gaze directions shown as dots or circles, the veillograms also
indicate regions of high, medium and low levels of visual perception. For example, if there is a LED
indicator blinking somewhere in the peripherals, for the sake of an example, indicator warning that the
85
received signal is over saturated, the user would have knowledge of that information, without even
directly looking at it. This information would have been omitted by traditional eye gaze analysis software
that relies solely on gaze direction. The veillogram emphasizes that the eyes are a rich array of sensors.
The veillance profile may be different from one individual to another. To improve the accuracy, an eye
test can be done for each user, or an average of a large set of participants used to estimate bioveillance.
Figure 82: Top: A 3D render of the bio-veillance measured from a participant overlooking some of the control
(equiliation) knobs on an audio mixer for a brief duration about 5 seconds. The model recognises two surfaces of
interests. Bottom left: Another bio-veillance render of the participant looking at the labels above the bottom sliders
on the mixer. Bottom right: A photograph of the audio mixer which the experiment is conducted on. Two ArUco
markers are visible on the panel to help the program identify surfaces of interest.
Although the veillograms shows the visual attention of the participant over surfaces of interest in the
general direction of the gaze, the system does not work accurately to the point. There are observable,
86
non-linear offsets between the visualized results and what the participants claimed to have gazed at. There
may be many sources of error, with minor sources such as inaccurate pose estimation of the surface
markers, or inaccuracies in the eye tracking device, and/or the calibration program. One main source of
error may be caused by the misalignment of the scene and eye camera, with inaccurate models describing
their perspective transformation from one to the other.
6.5 Improved equipment design using the eyetap principle The main problem of inaccuracies identified in the previous section is a direct result of the misalignments
between the eye and the scene camera, while not having an accurate enough perspective transformation
matrix to describe the relationship between the two cameras. This section proposes a newer prototype that
helps eliminate the need to realign the cameras computationally, by optically aligning the cameras
themselves using the eyetap principle. [99] The newer system would provide greater accuracy and reduce
computation cycles used for vector transformations.
Figure 83: A conceptual diagram illustrating the eyetap principle. [43] A double sided mirror is placed 45 degrees
in front the the participant’s natural optical axis. In this design, theoretically the scene camera’s optical axis is
perfectly aligned with the optical axis of the eye. Furthermore, the reflection on the eye camera allows the eye
camera’s axis to be perfectly antiparallel to the natural gaze axis of the eye.
In this design, an adjustable acrylic mirror (referred as the beamsplitter) is positioned right in front of the
participants natural optical axis at 45 degrees as shown in figure 83. The natural optical axis in this paper
87
is defined as the axis formed by the eyes when they are gazing at an object theoretically infinite amount of
distance away in the forwards direction of the person. The scene camera would now be placed sideways
perpendicular to the natural optical axis of the eye. Due to the mirroring optics, the camera would have
the same gaze alignment as the human eyes. When using a one sided mirror, or a beam splitter, with the
two axes effectively having the same optical axis after the reflection (eye camera) or transmission (eye).
This is effective for eliminating the need to do perspective transformation. However, there may be a small
translational offset, depending on the width of the reflecting mirror, but can be easily corrected by shifting
the pixels to adjust for the offset. Figure 84 bottom left shows the front view of the eyetap implemented,
as well as additional photographs of Mann wearing eyetap devices. These images all have a camera
aligned with the natural optical axis of the wearer, using a beamsplitter.
Figure 84: Top: Image of Mann wearing a pair of smart glasses that uses the eyetap principle, with another example
located at Bottom right. Bottom left: Newer proposed design of the bio-veillance prototype that uses the eyetap
principle. In these photographs, a camera appears at the position of the eye, this is because the beam splitter optics
aligns the optical axis of the camera with that of the eye.
Under this setup, when the eye tracking camera is placed on the other side of the mirror, facing the mirror,
it would capture the image of the eye with perfect alignment. This theoretically can be used to simplify
the eye tracking algorithm, since the center of the frame is aligned with the center of the eye. Depending
on the transmissive and reflective properties of the glass used, the user may see through more of the glass
in the expense of less clarity from the camera feeds. Currently gold coated mirror and aluminum acrylic
mirrors are used for testing. While the gold coated mirror reflects visible light better, the mirror blocks all
light from passing and therefore obstructing the user’s field of view. On the other hand, the acrylic
88
reflects less optical light but also transmits light through, allowing the user to see through the beam
splitter.
Figure 85: A photograph of the proposed prototype using the eyetap principle to align eye and camera axes. The
prototype consists of an IR regulator connected to an IR LED, two miniature cameras (scene and eye cameras), a
beam splitter which is currently being designed and tested, and the headframe which the components lies upon.
Figure 85 shows a photograph of the newer iteration of the bio-veillance prototype. The infrared regulator
circuit from the previous prototype is made compact using surface mount components, and/or replaced
with smaller parts. The infrared LED is secured at the corner of the acrylic housing, with the diagonal side
built with beam splitter material. Currently the material of the beam splitter is a gold plated mirror, but is
likely to change. The cameras used in the prototype are reduced in size. The miniature cameras sacrifices
resolution, clarity and responsiveness to light in the infrared range compared to the previous web cameras,
to compensate for the compact glass design. The housing unit is secured to the head frame via nuts and
screws attached to a module slider, so the position of the housing can be adjusted or replaced by another
89
module (such as a display unit) altogether. This type of head frame design is inspired from the open
eyetap project, founded by Mann, Lv, Yang, and Park. [100] Figure 86 shows a photograph taken from
the open eyetap website. [101] Depending on the needs of the user, the various independent modules can
be moved or added into existing frame. There are various modules implemented such as thermal imaging,
memory aid system, RADAR, and others.
Figure 86: A photograph taken from the open eyetap project website. [101] This prototype is designed for open
source collaboration and its sensor systems fully modular.
Theoretically, the new prototype would offer a better axes alignment between the two sets of images,
without changing the contents of the images themselves. The incoming frames from the cameras would be
very similar to that of the original system, transformations such as flipping and/or rotating the image can
be used to match the orientation that of a regular imaging system. Two identical digital cameras may be
used to calibrate the pixel offset between the two systems, to account for the depth of the beam splitter
(assuming that the surfaces of the splitter are perfectly parallel). New threshold parameters can be trained
to adapt to the brightness and contrast of the frames to allow detailed contours to be detected. Once the
contours are detected, the process would be identical to that of the previous prototype.
Due to the fact that the newer prototype being designed and tested around the time of this paper’s writing,
the system is yet to be fully implemented and tested. Some current challenges in improving the system at
the time of writing, are to select the appropriate camera systems and beam splitter material to allow
sufficient amount of pupil and facial features to be detected by the camera. Without increasing the
infrared illumination beyond a harmful threshold, the material needs to be efficient enough in reflecting
infrared light into the eye camera, and the camera needs to be sensitive enough to process the changes in
infrared received. All this is done while trying to have the components fit in with the mechanical system
shown as above. Figure 87 shows two images captured from bio-veillance systems, with the older
90
prototype shown on the left and the eyetap design shown on the right. Notice due to the effects of the
reflection, and the relative alignment of the eye camera on the eyetap device, the image captured is a
flipped, rotated version with respect to that of the older model. Overcoming the illumination challenge,
and integrating the system using the previous framework may be considered as one aspect of future works
for this paper.
Figure 87: Left: A cropped image captured from the eye looking camera of the original bioveillance system. The
image shows significant amount of contrast between the iris and the pupil. The accurate detection of the center of the
pupil is essential for the eye tracking method employed. Right: A cropped image captured from the eye tracking
camera of the eyetap bioveillance system. A gold plated mirror is visible on the image, and through its reflections a
rotated image of a eye illuminated by infrared is barely visible.
6.6 Summary In this chapter, the concept of veillance from a video camera is expanded into the human eye. A rough eye
testing methodology is created to correlate an estimation of the bio-veillance power as a function of space,
using eye absement as the metrics. Using these approximations of bioveillance power, bioveillograms are
generated using a bioveillance prototype that combines an eye tracker and surface (marker) detector. The
bioveillograms shows sensory attention of the human eye, as an non-uniform, sensory rich array of
sensors. The implementation of the eye tracker is explained in detail, as well suggesting an improvement
in the optics of the system to align the eye’s natural optical axis to that of the camera’s. The optics of the
eyetap principle is explained through the newly proposed prototype.
91
Chapter 7: Vixel distributions and blurring Looking at the veillogram produced in figure 88, which is duplicated from figure 64, it is noticeable that
the patterns produced on the model isn’t very smooth. Looking at the edges of the veillance projections,
one will notice some intertwining, repetitive patterns of some level of veillance followed by no veillance
at all (as black pixels).
Figure 88: Left: replicated image from figure 64, the veillgram produced by a camera. In the regions indicated by
the red circles, it is clearly visible a repetitive pattern with intertwined light and dark pixels. Right: The bottom
circle is zoomed in to enhance details.
The result is very misleading as the black stripes indicate that these spots were not at all visible by the
camera, while its neighbouring pixels are. The reason why this is the case, is that the veillance vector
model has been simplified for practical computation purposes, where an entire vixel area is represented as
a single pixel on the image.
This chapter explores the definition of a vixel, and then describes the theoretical framework for modelling
and measuring the distribution of veillance within individual vixels. The chapter applies these findings to
suggest a correction to the veillogram issue shown above. The chapter relates vixel distribution to various
types of image blurring, and proposes methods to deblur such images. Lastly, mathematical and
theoretical formulations are expanded from the existing Mann-Janzen work on veillametry. This models
sensory flow as the reverse of signal flow, in the context of vixel regions to sensors. This idea can be then
applied to other forms of sensory flow, such as within electronic circuits, proposed by Janzen, Yang, and
Mann. [43]
92
7.1 Vixel definition, spatial resolution, and vixel overlap As defined earlier in this paper, the vixel is defined as the physical surface that corresponds to each pixel
of the camera. Using the extramission approach, a vixel can be thought as the ray of sensory radially
emitted by the pixel, and whatever surface(s) the ray hits, can be considered the vixel. In other words, the
vixel is the radial projection of pixels outwards into space, until it hits some physical surface with some
photoquantity, which contributes to the values of the pixel to be added. Figure 89 shows a diagram
illustrating vixels propagating from a camera (shown as a grey dot), and estimation of two sets of vixel
surfaces, one radially closer to the camera and the other one further away. The figures are shown for two
sets of camera orientation.
Figure 89: Figures showing two sets of vixels produced from a pinhole camera, one closer and the other further
radially from the camera. As shown by the vixels, the camera on the left has a resolution of 4x4 pixels, while the one
on the right has a resolution of 5x5 pixels.
When the vixel surfaces are present closer to the pinhole camera, the vixel areas are smaller and has a
higher spatial resolution (density) than that of vixels further away. This is because the pixel value is a
function of the addition of all the details of the vixel. The summation of details over a larger surface
makes harder for finer details to be resolved. An example of this is shown in the introductory chapter with
aerial photographs of a city.
Although shown as perfect radial arc sections on the figure above, the spatial sensitivity for the individual
vixels are more of a sensitivity distribution, shown in figure 90. [60] This figure is adapted from Janzen
and Mann’s previous work on veillance. [17][18] Due to the optics of the camera, there is a spread of the
photosensitivity spatially across the vixel. Furthermore, the spread of the sensitivities overlaps adjacent
vixels, causing many to one mappings between vixel regions to pixel values at some areas. One
93
hypothesis is that there is a component of image blurring caused the overlapping of vixels. This is due to
the fact that overlapping vixels dictates a significant amount of photoquantities shared between
neighbouring pixels and thus smoothing (blurring) the image. The next sections will attempt to design
experiments to compute the distribution as a simplified matrix, and use it as an attempt to deblur the
images.
Figure 90: Figure adapted from Janzen’s veillance paper. [17] The spatial sensitivity for vixels (pixels) are shown
as a spatial distribution of photosensitivity over the vixel region. A spread of the sensitivity values overlaps adjacent
vixels.
7.2 Method to measure vixel distribution and overlap In this section, two sets of experiments are proposed. The first is to observe the veillance distribution of a
vixel through experimental testing, the distribution of the vixel varies depending on the optics of the
camera, but can be often modelled as a gaussian distribution, although inverse polynomial models are also
reasonable. The second experiment is to measure the amount of vixel overlap adjacent pixels have on
each other from the same camera. Under the assumption that the camera have no distortion, so the center
pixels are representatives of others in the same uniform grid.
94
7.2.1 Ideal vixel distribution Theoretically, in the ideal case, the amount of veillon overlap is minimal, with the amount of
photosensitivity as evenly as possible spread across the vixel area. Given the gaussian spread assumption,
a low veillon overlap would be one with a low variance distribution (steep slopes) over a smaller area, so
the amount of veillon overlap power is minimized. In this case, the unit of veillance is uniformly
distributed across the vixel region.
7.2.2 Experimental setup for measuring veillance distribution in single vixel To measure the amount of veillance power on the various parts of the vixel region, hence the vixel’s
veillance power distribution, an experiment is proposed, shown in figure 91.
Figure 91: The proposed experimental setup to measuring veillance power distribution over a vixel region. A
stationary camera is pointed towards a computer screen, loaded with a plotter program. The distance of the camera
and the screen is adjusted so significant amount of pixels would fall into the same vixel region.
The proposed setup consists of a stationary, preferably low resolution camera, pointed towards a
computer screen loaded with a plotter program. The distance of the camera and the screen is adjusted such
that a significant number of pixels would have been contained by the area of one vixel region. The vixel
region dimension (which is m by n) can be computed, using the center most camera pixels, where the
angle of deviation, is very small in relative to the optical axis, and the small angle approximation holds:
95
where px and py are half of the numbers of the pixel resolution of the camera, if the camera is 640x480,
then px = 320 and and py = 240. M and N are half the horizontal and vertical distance seen by the field of
view of the camera, D meters away from the camera. This model is held under the assumption that the
camera have negligible amount of distortion. The distance, D, is adjusted until m and n covers about one
ninth amount of screen pixels.
Before the plotter program begins, a calibration program is run. The calibration opens a window, which is
identical to the actual plotter window, with the same center. A large white crosshair is produced on the
middle of the black screen. Using video feedback similar as shown in the introductory chapter, the screen
and/or the camera is then adjusted to have perfect alignment. Any small amount of error in the positive
feedback loop will significantly distort the crosshair.
Next, the plotter application allocates an array for each of the pixels on the screen, and then one by one
toggles each pixel from black to white to record the effect of toggling the pixel have on the change of the
sensor reading of the center-most pixel. Then that pixel is toggled back to black. The plotter application is
run 5-10 times to ensure a high signal to noise ratio, assuming the noise is unbiased. The data is then
normalized with respect to the extremas, and then color mapped to produce figure 92. The initial
experiment used supercells that are 4x4 screen pixels. The details of the experiment setup to produce this
figure can be found under the figure description. It is noticeable that there are trace amounts of signal
change on the background, which may be contributed by the change in ambient lighting. Figure 93 shows
another set of data obtained in a darkroom. It is visualized with slightly changed color mapping function
to enhance the details, changing the super pixel size to 2x2, while other factors are kept.
Figure 92: A color mapped representation of veillance power distribution over its vixel region, and area around it.
The figure is 128 by 128 pixels, taken by a webcam with a field of view approximately 80 by 60 degrees,
approximately 1.24 meters away from the screen. The theoretical vixel border is draw on the figure as a blue square,
as reference.
96
Figure 93: Another set of data collected under the same setup as figure 92, with the super-pixel size reduced to 2x2.
The blue rectangular border is carefully drawn by looking at the veillance centers of neighbouring pixels in order to
bisect them as evenly as possible. The rectangle only serves as reference, and is not representative of vixel region.
The veillance power distribution collected needs to be more accurate, and have additional number of data
points before a generalized model can be constructed. In future works, a LASER is proposed to be used to
increase sub-vixel resolutions. Adjusting the direction can be disregarded for small angles of steps (center
pixels). The LASER would be placed on devices similar to the 2D plotter explained earlier in this thesis.
It is apparent that there is some form of non-linear distribution of power inside and outside of the
theoretical vixel region. This is confirmed by the fact the center of vixel mass are closer than their
neighbours than their region center-to-edge distance, or radius, when assuming a circular vixel region.
7.2.3 Experimental setup for measuring vixel overlap The central most pixels are selected for this experiment in the attempts to minimize errors caused by
camera distortion effects, such as barrel and vignetting effect. In addition, when the optical angle offset is
minimized, the errors is also minimized between using a surface plane model and the ground truth of a
spherical surface model, established from earlier chapters.
Under the extramission model, the assumption made earlier is held: the near center pixels are identical in
their veillance distribution (uniform for small angles of incidence). This results in the mutual overlapping
of identical distributions over a vixel region. For simplicity, assume the total amount of area under the
low variance model, two vixels or further is negligible (far less than the area under the vixel or its
immediate neighbour, as observed by figure 92). Figure 94 shows in one dimension, three neighbouring
vixels that overlaps the center pixel.
97
Figure 94: A diagram of veillance distribution modelled as low variance gaussian. The neighbouring pixels near the
center of the camera are assumed to be identical. The figure shows a vixel region and regions of overlap.
From the previous experiment, the effect of toggling each of the screen pixels on the camera pixel’s
neighbours are also noted. The amount of veillance strength overlap at a screen pixel with its neighbour is
determined as the minimum value of the two readings. The total veillance overlap, O(a,b), one pixel has
on its neighbour (referred as a and b respectively) is the space integral of the minimum of the two values,
over all screen pixels inside the vixel region described below. Non-zero overlap regions indicate that
these areas affect the pixel value reading of both of the neighbouring pixels, contributing to optical
blurring.
Figure 95 illustrates the colour mapped image representing the amount of overlap under this
configuration. The data is collected from the center-most 3x3 pixel grid of the camera. The figure is
derived from figure 93. The colour map is adjusted to enhance data differences.
Figure 95: Image showing the amount of veillance power overlap the center most camera pixel has on its immediate
8 neighbours. Note that overlap of one neighbour pixel to another is not included in this visualization.
98
7.3 Optical blurring and vixels overlap In basic image processing, image blurring can be achieved using image kernels, or image convolution of a
small matrix over every pixel in the image. The kernel used in this example is 3x3, as shown in figure 96
taken from an online tutorial. [42] The figure visually shows how one pixel in the blurred image is
computed from the original image using a blur kernel.
Figure 96: a screenshot of an online interactive program [42] that demonstrates how to blur an image, or image
convolution using a blur kernel. The figure shows how a 3x3 pixel region taken from the input image to the left is
applied to a kernel matrix, to produce the resultant blurred pixel in the output image to the right.
Examining the blur matrix, it is noticeable that the output is the normalized distribution of weights applied
to the center pixel and its immediate neighbours. In this case, there is significant blurring as the adjacent
pixels weighs half of the central pixel, and the corners have one quarter the weight. The matrix recorded
in the previous step and the blurring matrix have a lot in common, as having vixel overlap will cause
blurring. In the introductory section example, where the camera is fogged, and all the vixels are heavily
diffused (overlapped) all the pixels reveal very similar pixel value, giving it a blurry appearance.
Overlapping of vixels causes blurring because there is a component of the neighbouring pixels that
contributes to multiple pixel values. There is then a direct contribution to image blurring caused by vixel
overlap shown in figure 97. With this in mind, the vixel experiment is repeated, however, this time the
camera is purposely set out of focus with respect to the subject matter, to varying degrees. The experiment
is continued from the previous setup, using the same configuration, with only the focus changed.
99
Figure 97: The vixel distribution for three blur settings of a web camera is measured. From the top a relatively
in-focus image and its center most vixel distribution illustrated, followed by another with significant amount of
blurring, then followed by extreme amount of blurring.
100
The relationship between the vixel region and veillance power distribution and its effects on optical
blurring are verified in figure 97; the more out-of-focus the object is from the camera, the larger the vixel
region becomes, with a wider distribution of veillance strength over the region. Extending this finding
with the extramissive optics framework, the amount of veillance distribution and blurring can also be
described as a function of space, depending on the optical systems employed. Figure 98 illustrates a ray
tracing diagram of a camera’s optics. Without the loss of generality, two random photosensors from a
densely packed sensor array are selected and their rays traced as a function of space. The further away a
subject matter is from the focal point of the system, the larger the veillance cone is for that area, which
introduces not only decreased vixel density, or less spatial resolution, but introduces a significant amount
of blurring, as near pixels would have a huge overlap of spatial content. As the subject matter moved to
the focal point, the vixel regions converge and becomes smaller, and so the effect of blurring caused by
overlap of vixel regions.
Figure 98: A ray tracing diagram showing the optical workings of a typical digital camera. In the extramissive
framework, the sensors ‘emit’ veillance, or ability to sense outwards through an optical system, causing the
sensitivity to information to form a cone that converges at the focal point of the system.
The amount of blurring can be quantified as a function of spatial overlap, or more definitively, the amount
of effective veillons counted. To quantify the amount of unique bits of information sensed (veillons), as a
function of space, a few definitions are to be established. Given the uniform distribution of veillance
power, the amount of unique information sensed by all sensors is proportional to the total amount of vixel
regions covered by the pixels not counting repetitions. In other words, the total veillon count, Veff, is the
total number of vixels (Vtotal) emitted by the camera, minus any overlapping areas (Voverlap), while the first
overlap is still accounted for. Furthermore, the amount of blur the camera senses can be quantified as:
101
The new definition is now explained by an example. Consider the three cases of veillance power overlap
illustrated in figure 99. In the first case, where there is complete overlap, the two pixels would always
produce the same result, and in effect, the amount of unique bits of information from these sensors are
only one veillon (100 percent blur). In case 3, where the two sensors covers two independent vixels, then
the number of veillons per frame is two (0 percent blur). In the case of a partial overlap, the veillon count
is two minus the once the overlapping region.
Figure 99: Figure illustrating the different scenarios in which two vixels can overlap each other. Any additional
vixel introduced to the system can only increase the total effective veillon count from 0 to 1. For uniform veillon
distributions, the overlapping surface area can be used to estimate effective veillon count.
In the case of a digital camera, when the pixels are further away from the optical axis, a more detailed
model of veillance distribution is needed. Using lens equations, and basic trigonometry, the radius and
orientation of the vixels can be computed, when the sensor separation distance, and effective focal length
of the lens systems is known. For modelling purposes, an optical sensory model can be estimated by
having each sensor emit an evenly distributed array of vectors and trace out their distribution through the
optical system onto vixel surfaces.
An algorithm is proposed to compute the effective veillon count, which can be used to compute amount of
blur, of a camera, given an arbitrary scene. Starting with an empty pixel set, U = Ø, and the set P that
contains all the pixels of the camera. Starting with one pixel from P, the effective veillon count, Veff, is
added by the weight of the veillance distribution integrated over the vixel region, as long as the
differential surface is not a part of vixels that belongs to the set U. After each pixel is processed, it is
removed from the set P and added into the set U. In the case where the differential integrating surface
belongs to both the active pixel, and one of more existing pixels in the U set, the distribution weights are
compared. If the current weight is higher than all existing weights over that area, then the veillance count
102
replaces the previous count, otherwise the process continues to the next surface or pixel. Note that in
order for the surface area considered to be differential, it must be small enough a surface so it is fully
occupied by any vixel distributions over it.
7.4 Image deblurring using vixel distribution matrix Given the unblurred image u, and an estimate of the blur kernel matrix m (similar to that of figure 96, but
possibly with a bigger kernel size), the resultant image, b, is computed as:
where is the convolution operator. For the application of deblurring a blurred image b, knowing the *
vixel distribution of the camera m, the unblurred image can be estimated as the deconvolution of m.
Let U, B, and M represent the fast Fourier transform (FFT) of u, b and m, respectively. The above
operation described in frequency domain is simply: B = UM. Therefore, to attempt to unblur the image, in
frequency domain the solution is proposed as: U = B/M. Once that is computed, the inverse fourier is
computed to produce the deblurred image.
For the assumed kernel matrix which describes a zero mean gaussian model, the variance of the model
can be computed using the neighbouring pixels’ photoquantity distribution, as there is an one to one
mapping between the variance and amount of vixel overlap (the area integral of the overlap outside of the
vixel region). Since the FFT of a gaussian function is also a gaussian function, the filter function acts as a
low-pass function in the frequency domain. This causes blurring as high spatial frequencies (details) are
filtered out. However, since the method holds accurately for pixels near the optical axis, where the small
angle assumption holds, the off-axis unblur effectiveness may not work as well for other regions.
In the case where the camera or the subject is in motion during the capture (kineveillance), the vixel area
would be a time integral of surfaces of exposure. This causes another form of blurring, known as motion
blur. The kernel can be approximated as the motion of the camera (or camera with respect to moving
object). This is because the vixel coverage is the integral of the path of motion. The weight of the kernel
are the amount of time the camera is over that area.
7.5 Upgrading veillogram renders With some greater insights on the vixel distribution from this chapter, the vector model for camera
veillance can be upgraded to reflect knowledge of the veillance distribution model. In future works, when
103
the models are generalized, these data can be used to replace the simple uniform distribution that is
currently employed. The vectors can be also placed and directed to mimic the effects of sensor focus and
sensory attention. Visually this can be differentiated by dense regions of veillance versus wide, spread-out
areas of veillance distribution. When a camera is pointed to subjects out of focus (for wearable computing
applications), or if there is method to understand eye focus, possibly using EEG sensors, the amount of
effective veillons (sensory attention) can be visualized as well to understand how sensors are interacting
with surfaces around them. The applications of sensory attention quantification can be used in a variety of
situations. Alarms can be placed near the driver, or in an aircraft cockpit to alert the operators in case of
prolonged sensory inattention. Applications regarding memory aid [101] can be used to complement
human’s occasional moments inattention during situations where information is produced, such as during
a conversation or a lecture. In terms of perfecting a humanistic intelligence system, sensory attention adds
another dimension to quantify the observability and controllability aspects of the humanistic intelligence
feedback system.
Furthermore, the current visualization program is proposed to have the amount of vectors emitted
dynamically adjusted as a function of the distance to the intersecting surface. This is to address the
problem seen at the beginning of this chapter with the veillance gaps. Currently multiple arrays of vectors
that of uniform distributions are implemented to approximate the effects, but is to be soon replaced by
dynamically resolution-changing Gaussian or other suitable models when it becomes available.
7.6 Veillametric formulations on sensory flow Through the example of understanding the relationship between vixels, blurring effects, and effective
veillon counts in the previous examples of this chapter, this section extends the earlier veillametry theory
outlined in the IEEE GEM paper “Painting with the eyes”: Sensory perception flux time-integrated on the
physical world by Janzen, Yang, and Mann. [43] This section creates some related theoretical definitions
and axioms regarding extramissive optics, referred as sensal propagation that extends to multiple light
sources or optical stimuli. This work attempts to quantify the amount of independent information received
from a digital camera per frame, although it could be extended to electronic signals, mathematical models,
and circuit diagrams, where inputs and outputs are involved. [43]
Looking into the context of multiple stimuli, as long as a sensor is operational, it emits one veillon
outwards in space per sampling period, which is one frame in the context of a video camera. When the
sensor is able to sense the effects of a stimulus (referred to as phenomenon in the publication, represented
104
as ui), or when changes in the level of the stimulus can affect the readings of the sensor, represented as y,
then it is concluded that the level of sensor output is some function, f, of the stimulus, y = f(ui). The
sensory contribution of the stimulus with respect to the sensor, V(ui|y), describes the distribution of
veillons onto the various input sources. If the sensor is exposed to a single stimulus, then V(ui|y) is 1, or
one hundred percent of the sensory capacity is used to sense that stimulus. If the sensor is not sensing a
stimulus y, at all, then V(ui|y) is 0. In a case where the sensor is exposed to a multiple stimulus, including
environmental noise, then the contribution to one stimulus is somewhere between 0 and 1. The input itself
is recoverable if the inverse mapping function of f is unique, given that V(ui|y) is 1, similar to the HDR
compression recovery work from earlier.
From the extramission theory aspect that is more relatable to this thesis, with respect to the effective
veillon count, given a vixel region surface, defined earlier in this chapter, the contribution of all
independent veillons emitted to the surface from various sensors, V(yi|s), is 1 when the surface is
occupied by only one sensor, and is 0 when there are no sensors at all covering that surface. The
contribution is somewhere between 0 and 1 based on the amount of information overlap between the
various sensors. As an instrument that measures photoquantity of a scene, the camera can make better
predictions when the effective veillon distribution of each sensor with respect to the scene is obtained.
The veillon distribution reduces the amount of information entropy compared to just the raw values
themselves, when attempting signal recovery algorithms such as deconvolution discussed for image
deblurring. For practical purposes, the distribution can be used to better model the information overlap
and one form of error in the data received by sensors.
Although only presented from the point of view of visual sensors, the idea to analyze sensory flow (or
sensal flow in the publication) can be extended to analyze the amount of non-redundant sensing capacity
of various sensors, logical and electronic circuits, mathematical, statistical and probability models. Sensal
flows are unique extramissive approach to quantify sensing capacities, that is more than just simply the
time reversal of signal flow. The amount of information input overlap referred as sensory attention can be
further expanded to model redundant measuring of models and equations as well.
7.7 Summary This chapter has examined the spatially projected region of a pixel in digital cameras, known as vixel
regions. The method for defining and measuring veillance power distribution over such vixel regions are
also explored. Detailed methods of defining and computing effective veillance count if described, and its
105
relations to optical explained theoretically, and using collected data as examples. The chapter made
proposed methods to unblur images captured from such cameras by modelling the veillance distribution
and using that as a kernel fed into a deconvolution function. The veillogram rendering program now
improves the simple veillance vectors to dynamically adjust its resolution. The important term of sensory
attention is introduced, as an extended application to veillametry. The chapter concludes with a theoretical
section on the extramissive concept centered topic of sensory flow.
106
Chapter 8: Conclusion This chapter summarizes the thesis “Veillametrics: An extramissive approach to analyze and visualize
audio and visual sensory fields and their usage” by identifying how the works done contributes to the
community at large, and also identifying future works needed for this thesis and the study of veillametry
in general.
8.1 Contribution Methodologically, this paper yields the extramissive approach to measure, analyze, and understand the
sensing capacity of various sensors, whether audio, video, or other modes such as electromagnetic waves.
Veillametry allows the generation of detailed models of sensory perception over space, and creates 3D
veillograms to allow user behaviour and product usage to be studied with complex veillance fields. The
thesis contributed from earlier works on the field of veillography, on methodologically quantifying the
relative values of the ability to sense as a function of space. Furthermore, using the extramissive
framework, surfaces can be tracked and veillance fields traced to produce veillograms that help visualize
these quantified values.
As identified in the application section of the introductory chapter, veillography have multiple uses to the
community, artistic, political, scientific and educational. Immersive gaming, consumer behaviour,
psychology studies, safety systems such as driving attentive enforcing units, veillance attention program
optimizers, image processing, product design, and teaching tools are some applications to this work, to
name a few. From a humanistic intelligence design perspective, the detailed study of sensors allows better
systems that optimizes both the observability path (bioveillance) and controllability path (sensor
veillance). Improved human-machine feedback loop creates natural synergy to improve HI system
performance.
8.2 Future work The future work is continued from now to better quantify old veilliance models since Professor Mann’s
earlier work back in the 1970s, [13][14] as well as Ryan Janzen’s work on veillametry from the recent
years. [17] The goal is to eventually generalize detailed 3D veilliance models for additional modalities of
veillance, and quantify veillance as a detailed mathematical formulation.
107
From the prototypes point of view, a better surface detection program is suggested to be used from section
6.2 of the thesis. To replace the marker tracking system to edge detection and tracking, while using
devices such as accelerometers and gyrometers to minimize blind spots when tracking and computing
veillance exposures. As for the bio-veillance prototype, it is a work in progress at the time of writing, the
work in aligning the camera axis and the eye axis still needs to be perfected. As identified previously, the
optics needs to be refined for proper computer vision programs to execute properly.
The accuracies of the veillance models can be further improved in future works to improve veillance and
veillogram estimates, since this thesis provides mostly the methodology to produce rough estimates of
veillance quantities. The field of view and test resolution can be improved to generate more accurate
models.
In chapter 7, better methodology or equipment is needed to improve the subpixel resolution of the data
obtained. The goal is to obtain a sufficient number of data points with low noise to signal ratio, so the
veillon distribution can be modelled and studied. In this chapter the term sensory is also formulated, and
its definition and quantification remains to be refined.
108
References
[1] Statista, “Global augmented/virtual reality market size 2016-2022 | Statistic”, Statista, 2018. [Online]. Available: https://www.statista.com/statistics/591181/global-augmented-virtual-reality-market-size/ . [Accessed: 02-Sep-2018]
[2] Jitesh Ubrani, “Global wearables markets grows 7.7% in 4Q17 and 10.3% in 2017…”, IDC, 2018. [Online]. Available: https://www.idc.com/getdoc.jsp?containerId=prUS43598218 . [Accessed: 02-Sep-2018]
[3] IRobot, “Roomba 980 Wifi connected wireless vacuum, product specifications”, IRobot. 2018. [Online]. Available: http://store.irobot.com/default/roomba-vacuuming-robot-vacuum-irobot-roomba-980/R980020.html . [Accessed: 02-Sep-2018]
[4] Echo Dot, “Amazon Echo Dot technical details”, Dish. 2018. [Online]. Available: https://www.dish.com/features/voice-control/alexa-integration/technical-specifications/ . [Accessed: 02-Sep-2018]
[5] Aircomm, “APX 4000 portable radio product specifications”, Aircomm. 2018. [Online]. Available: https://www.aircomm.com/downloads/motorola/motorola_apx4000_specsheet.pdf. [Accessed: 02-Sep-2018]
[6] Gideon Stein, “System and methods for detecting obstruction in a camera field of view”, US grant #US8553088B2. 23 Nov., 2005.
[7] Kenichi Kumatani, “Microphone array processing for distant speech recognition: from close-talking microphones to far-field sensors”, IEEE Xplore Digital Library. 2012. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/6296525/. [Accessed: 02-Sep-2018]
[8] James Chia-Ming Liu, “Total field of view classification for head-mounted display”, US grant application #US20120327116A1. 23 Jun., 2011.
[9] Gur Kimchi, “Augmenting a field of view in connection with vision-tracking”, US grant #US8494215B2. 05 Mar., 2009.
[10] Ryan Janzen, Steve Mann, “Swarm modulation: An algorithm for real-time spectral transformation”. IEEE GEM 2015. [Online]. Available: https://ieeexplore.ieee.org/document/7377214/. [Accessed: 02-Sep-2018]
[11] Steve Mann, “Augmented reality eyeglass with thermo vision…”, Instructables. 2017. [Online]. Available: https://www.instructables.com/id/Augmented-Reality-Eyeglass-With-Thermal-Vision-Bui/. [Accessed: 02-Sep-2018]
[12] Steve Mann, “HDR imaging”, Wearcam. 2018. [Online]. Available: http://wearcam.org/mannventionz/hdr.htm . [Accessed: 02-Sep-2018]
[13] Steve Mann. “Sequential Wave Imprinting Machine”. Wearcam. 2018. [Online]. Available: http://wearcam.org/swim/ . [Accessed: 02-Sep-2018]
[14] Steve Mann, Max Lv et al. “Phenomenologically Augmented Reality With New Wearable LED Sequential Wave Imprinting Machines”. ACM TEI. 2017. [Online]. Available: http://eyetap.org/docs/ACM_TEI_2017_SWIM_p751.pdf . [Accessed: 02-Sep-2018]
[15] Steve Mann, Max Lv et al. “Phenomenologically Augmented Reality with the Sequential Wave Imprinting Machine”. IEEE GEM. 2018. [Online]. Available: http://wearcam.org/gem2018/part3.pdf. [Accessed: 02-Sep-2018]
[16] Kedar Nath Sahu et al. “Study of RF Signal Attenuation of Human Heart”. Hindawi Journal of Engineering. 2015. [Online]. Available: https://www.hindawi.com/journals/je/2015/484686/. [Accessed: 02-Sep-2018]
109
[17] Ryan Janzen. “Veillance flux: the ability to hear, see, and sense.” Veillametrics. 2016. [Online]. Available: http://veillametrics.com/ . [Accessed: 02-Sep-2018]
[18] Ryan Janzen, Steve Mann, “Veillance Flux, Vixels, Veillons: An information - bearing extramissive formulation of sensing, to measure surveillance and sousveillance”, IEEE CCECE. 2014. [Online]. Available: http://www.veillametrics.com/Veillametrics_JanzenMann2014pub.pdf. [Accessed: 02-Sep-2018]
[19] Ryan Janzen, Steve Mann, “Sensory Flux from the Eye: Biological Sensing-of-Sensing (Veillametrics) for 3D Augmented-Reality Environments”, IEEE GEM. 2015. [Online]. Available: http://eyetap.org/docs/BioVeillametrics_JanzenMann2015.pdf. [Accessed: 02-Sep-2018]
[20] Steve Mann. “The birth of wearable computing and augmented reality”. Wearcam. 2015. [Online]. Available: http://wearcam.org/html5/mannkeynotes/engsci.htm#9 . [Accessed: 02-Sep-2018]
[21] Openframeworks. “Video feedback loop”. Openframeworks. 2013. [Online]. Available: https://forum.openframeworks.cc/t/video-feedback-loop/13347 . [Accessed: 02-Sep-2018]
[22] Steve Mann et al. “Toposculpting: Computational Lightpainting and Wearable Computational Photography for Abakographic User Interfaces”, IEEE CCECE. 2014. [Online]. Available: http://wearcam.org/abaq.pdf. [Accessed: 02-Sep-2018]
[23] Wasim,Ahmad, “Picture guide all about the images” . Picture Guide. 2016. [Online]. Available: http://picture.guide/2016/01/11/three-tips-better-shot-playing-in-traffic/. [Accessed: 02-Sep-2018]
[24] Exploratorium. “Persistence of Vision”. Exploratorium. 2018. [Online]. Available: https://www.exploratorium.edu/snacks/persistence-of-vision/ . [Accessed: 02-Sep-2018]
[25] Steve Mann. “Veillance Foundation: Decriminalizing Integrity in the age of hypocrisy”. Wearcam. 2017. [Online]. Available: http://wearcam.org/integrity.htm. [Accessed: 02-Sep-2018]
[26] Steve Mann, Joseph Ferenbok. "New Media and the Power Politics of Sousveillance in a Surveillance-Dominated World", Eyetap. 2013. [Online]. Available: http://www.eyetap.org/papers/docs/ Surveillance_and_Society_Mann_Ferenbok_4456-9724-1-PB.pdf. [Accessed: 02-Sep-2018]
[27] Steve Mann. “Veillance Foundation: Decriminalizing Integrity in the age of hypocrisy”. Wearcam. 2017. [Online]. Available: http://wearcam.org/integrity.htm. [Accessed: 02-Sep-2018]
[28] Steve Mann, Ryan Janzen, Mir Adnan Ali, Ken Nickerson, “Declaration of Veillance (Surveillance is Half-Truth)”. IEEE GEM. 2015. [Online]. Available: http://www.eyetap.org/docs/DeclarationOfVeillance_MannEtAl2015.pdf. [Accessed: 02-Sep-2018]
[29] Steve Mann. “Veillance and Reciprocal Transparency: Surveillance versus Sousveillance, AR Glass, Lifeglogging, and Wearable Computing”. IEEE ISTA . 2013. [Online]. Available: http://www.eyetap.org/papers/docs/IEEE_ISTAS13_Veillance1_Mann.pdf. [Accessed: 02-Sep-2018]
[30] Steve Mann et al. “Cyborglogging with Camera Phones: Steps Toward Equiveillance”. ACM Conference. 2006. [Online]. Available: http://www.eyetap.org/papers/. [Accessed: 02-Sep-2018]
[31] Steve Mann. “ECE516: Intelligent image processing”. Wearcam. 2016. [Online]. Available: http://wearcam.org/ece516/ . [Accessed: 02-Sep-2018]
[32] Steve Mann. “Phenomenal augmented reality ...”. Instructables. 2016. [Online]. Available: https://www.instructables.com/id/Phenomenal-Augmented-Reality-Allows-Us-to-Watch-Ho/ . [Accessed: 02-Sep-2018]
[33] Steve Mann. “Sequential wave imprinting machine”. Wearcam. 2015. [Online]. Available: http://wearcam.org/swim/swim74/ . [Accessed: 02-Sep-2018]
110
[34] Steve Mann. “Heart with SWIM”. Wearcam. 2017. [Online]. Available: http://wearcam.org/swim/ecg/. [Accessed: 02-Sep-2018]
[35] Ryan Janzen and Steve Mann. "Veillance Dosimeter, inspired by bodyworn radiation dosimeters, to Measure exposure to inverse light", IEEE GEM. 2014. [Online]. Available: http://wearcam.org/gem2018/part4.pdf. [Accessed: 02-Sep-2018]
[36] James Parson, “3 Free Tools to See a Heat Map of Your Website Visitors”. Growtraffic . 2015. [Online]. Available: https://growtraffic.com/blog/2015/03/3-free-tools-see-heat-map-your-website-visitors. [Accessed: 02-Sep-2018]
[37] IMotions. “Top 8 eye tracking applications in research”. IMotions. 2015. [Online]. Available: https://imotions.com/blog/top-8-applications-eye-tracking-research/. [Accessed: 02-Sep-2018]
[38] Rachel Albert et al. “Latency Requirements for Foveated Rendering in Virtual Reality”, NVIDIA. 2017. [Online]. Available: https://research.nvidia.com/publication/2017-09_Latency-Requirements-for . [Accessed: 02-Sep-2018]
[39] Brian Guenter et al. “Foveated 3D graphics”, Mircosoft. 2012. [Online]. Available: https://www.microsoft.com/en-us/research/wp-content/uploads/2012/11/foveated_final15.pdf . [Accessed: 02-Sep-2018]
[40] Susumo Tachi et al. “Foveated Streaming: Optimizing video streaming for Telexistence systems using eye-gaze based foveation”, Researchgate. 2017. [Online]. Available: https://www.researchgate.net/publication/320224927_Foveated_Streaming_Optimizing_video_ streaming_for_Telexistence_systems_using_eye-gaze_based_foveation . [Accessed: 02-Sep-2018]
[41] Jason. JS. Barton et al. “Optical Blur and the Perception of Global Coherent Motion in Random Dot Cinematograms”, ScienceDirect. 1996. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0042698996000636. [Accessed: 02-Sep-2018]
[42] Victor Powell, “Image Kernels ”, Setosa. 2018. [Online]. Available: http://setosa.io/ev/image-kernels/. [Accessed: 02-Sep-2018]
[43] Ryan Janzen, Sen Yang and Steve Mann. ‘“Painting with the eyes”: Sensory perception flux time-integrated on the physical world’, IEEE GEM. 2018. [Online]. Available: http://wearcam.org/gem2018/part4.pdf. [Accessed: 02-Sep-2018]
[44] Steve Mann, Raymond Lo et al. “Realtime HDR (High Dynamic Range) Video for EyeTap Wearable Computers, FPGA-Based Seeing Aids, and GlassEyes”. IEEE CCECE. 2012. [Online]. Available: http://www.eyetap.org/papers/docs/HDREyetap_IEEEccece2012paper551.pdf. [Accessed: 02-Sep-2018]
[45] Steve Mann and Mir Adnan Ali. “Comparametric image compositing: computationally efficient high dynamic range imaging”. IEEE ICASSP. 2012. [Online]. Available: http://www.eyetap.org/papers/docs/ICASSP2012_0000913.pdf. [Accessed: 02-Sep-2018]
[46] Prusa. “Prusa I3 3D printer”, Prusa. 2015. [Online]. Available: https://www.prusaprinters.org/prusa-i3/ . [Accessed: 02-Sep-2018]
[47] Steve Mann. “The chirplet transform”. Wearcam. 2018. [Online]. Available: http://wearcam.org/chirplet.htm. [Accessed: 02-Sep-2018]
[48] 3D Hubs. “Kossel Mini 3D printer”. 3D Hubs. 2018. [Online]. Available: https://www.3dhubs.com/3d-printers/kossel-mini. [Accessed: 02-Sep-2018]
[49] STL Finder. “Linear delta 3D printer”. STL Finder. 2018. [Online]. Available: https://www.stlfinder.com/model/silvestr-linear-delta-3d-printer-lblLMuUp/4860398 . [Accessed: 02-Sep-2018]
111
[50] Chris Aimone, Steve Mann. “Camera response function recovery from auto-exposure cameras”. IEEE Conference in Image Processing, 2007. [Online]. Available: https://pdfs.semanticscholar.org/a98b/. [Accessed: 02-Sep-2018]
[51] Cambridge in colours. “Understanding range in digital photography”. Cambridge in colours. 2018. [Online]. Available: https://www.cambridgeincolour.com/tutorials/dynamic-range.htm . [Accessed: 02-Sep-2018]
[52] Wikipedia Commons. “Gamma correction”. Wikipedia Commons. 2018. [Online]. Available: https://en.wikipedia.org/wiki/Gamma_correction . [Accessed: 02-Sep-2018]
[53] S. Mann. Comparametric equations, quantigraphic image processing, and comparagraphic rendering. Intelligent Image Processing, pages 103–178, 2002.
[54] Jason Huang, Steve Mann et al. “High Dynamic Range Tone Mapping Based On Per-Pixel Exposure Mapping”. IEEE ISTAS. 2013. [Online]. Available: http://www.eyetap.org/papers/docs/IEEE_ISTAS13_PPEM_Huang_etal.pdf. [Accessed: 02-Sep-2018]
[55] Jason Huang. “Real Time HDR (High Dynamic Range) Image Processing For Digital Glass Seeing Aid”. University of Toronto Thesis Library. 2016. [Online]. Available: https://tspace.library.utoronto.ca/ Huang_ShiChieh_Jason_201311_MASc_Thesis.pdf. [Accessed: 02-Sep-2018]
[56] Steve Mann. “Sitting waves”. Wearcam. 2018. [Online]. Available: http://wearcam.org/swim/C-band_radar/ stephanie_car_track/stephanie_car_track3/sittingwaves.htm. [Accessed: 02-Sep-2018]
[57] Steve Mann. “Sequential Wave Imprinting Machine: Visualizing Veillance Waves of a microphone's capacity to listen (capacity to sense sound).” 2017. [Online]. Available: http://wearcam.org/swim/SWIM4amplifiers/SWIMstephanie/ . [Accessed: 02-Sep-2018]
[58] Point Grey. “Technical Application Notes.” 2017. [Online]. Available: https://www.ptgrey.com/tan/10694. [Accessed: 25-Sep-2018]
[59] Point Grey. “Point Grey Product Catalog and Sensor Review.” 2017. [Online]. Available: https://www.ptgrey.com/support/downloads/10291 . [Accessed: 25-Sep-2018]
[60] Ryan Janzen. “Veillance flux: the ability to hear, see, and sense.” Veillametrics. 2016. [Online]. Available: http://veillametrics.com/ . [Accessed: 02-Sep-2018]
[61] Umesh Madan. “Extramission theory: the beginning of sight”, The weekend historian. 2009. [Online]. Available: https://umeshmadan.wordpress.com/tag/extramission-theory/ . [Accessed: 02-Sep-2018]
[62] Art History Resources. “Extramission versus intromission”, Art History Resources. 2014. [Online]. Available: http://arthistoryresources.net/visual-experience-2014/extramission-intromission.html. [Accessed: 02-Sep-2018]
[63] NASA. “More on Brightness as a Function of Distance”, NASA . 2016. [Online]. Available: https://imagine.gsfc.nasa.gov/features/yba/M31_velocity/lightcurve/more.html . [Accessed: 02-Sep-2018]
[64] Steve Mann, Max Lv et al. “Phenomenologically Augmented Reality with the Sequential Wave Imprinting Machine”. IEEE GEM. 2018. [Online]. Available: http://wearcam.org/gem2018/part3.pdf. [Accessed: 02-Sep-2018]
[65] Merriam Webster. “Definitions of photon”, Merriam Webster. 2018. [Online]. Available: https://www.merriam-webster.com/dictionary/photon. [Accessed: 02-Sep-2018]
[66] Britannica. “Electron”, Britannica. 2018. [Online]. Available: https://www.britannica.com/science/electron. [Accessed: 02-Sep-2018]
[67] C. Chaulk. “Conventional current versus electron flow”. 2015. [Online]. Available: https://www.mi.mun.ca/users/cchaulk/eltk1100/ivse/ivse.htm#. [Accessed: 02-Sep-2018]
112
[68] NESO Academy. “Electron vs hole flow”. NESO Academy. 2018. [Online]. Available: http://www.nesoacademy.org/ electronics-engineering/analog-electronics/chapter-1/electron-vs-hole-flow . [Accessed: 02-Sep-2018]
[69] CIS, “Rods & Cones”, CIS, 2018. [Online]. Available: https://www.cis.rit.edu/people/faculty/montag/vandplite/pages/chap_9/ch9p1.html. [Accessed: 02-Sep-2018]
[70] Hyperphysics, “Rod and cones density on retina”, Hyperphysics. 2000. [Online]. Available: http://hyperphysics.phy-astr.gsu.edu/hbase/vision/rodcone.html . [Accessed: 02-Sep-2018]
[71] Merriam-webster, “Visual Acuity”, Merriam-webster. 2018.[Online]. Available: https://www.merriam-webster.com/dictionary/visual%20acuity . [Accessed: 02-Sep-2018]
[72] Howstuffworks, “What is the difference between CCD and CMOS image sensors in a digital camera?”, Howstuffworks. 2018. [Online]. Available: https://electronics.howstuffworks.com/cameras-photography /digital/question362.htm . [Accessed: 02-Sep-2018]
[73] Jay Turberville, “Camera Field of View”, Jayandwanda. 2003. [Online]. Available: http://www.jayandwanda.com/digiscope/vignette/camerafov.html. [Accessed: 02-Sep-2018]
[74] Nature Resources Canada, “Spatial Resolution, Pixel Size, and Scale”, Governance of Canada. 2015. [Online]. Available: http://www.nrcan.gc.ca/node/9407. [Accessed: 02-Sep-2018]
[75] Shari. “The difference between barren and pincushion distortion”. Shariblog. 2013. [Online]. Available: http://www.shariblog.com/2013/08/difference-between-barrel-pincushion-distortion . [Accessed: 02-Sep-2018]
[76] OpenCV. “Camera calibration With OpenCV”. OpenCV. 2018. [Online]. Available: https://docs.opencv.org/3.4.0/d4/d94/tutorial_camera_calibration.html . [Accessed: 02-Sep-2018]
[77] Wikipedia Commons. “Vignetting”. Wikipedia Commons. 2017. [Online]. Available: https://en.wikipedia.org/wiki/Vignetting. [Accessed: 02-Sep-2018]
[78] Aplicaciones de la Visión Artificial. “ArUco: a minimal library for Augmented Reality applications based on OpenCV”. Aplicaciones de la Visión Artificial. 2018. [Online]. Available: http://www.uco.es/investiga/grupos/ava/node/26 . [Accessed: 02-Sep-2018]
[79] M. I. Posner, “Orienting of attention,” Quarterly journal of experimental psychology, vol. 32, no. 1, pp. 3–25, 1980.
[80] M. I. Posner, C. R. Snyder, and B. J. Davidson, “Attention and the detection of signals,” Journal of experimental psychology: General, vol. 109, no. 2, p. 160, 1980.
[81] M. H. Fischer, A. D. Castel, M. D. Dodd, and J. Pratt, “Perceiving numbers causes spatial shifts of attention,” Nature Neuroscience, vol. 6, no. 6, p. 555, 2003.
[82] Ryan Janzen, Steve Mann. “Veillance Flux, Vixels, Veillons: An information-bearing extramissive formulation of sensing, to measure surveillance and sousveillance”. IEEE CCECE. 2014. [Online]. Available: http://veillametrics.com/Veillametrics_JanzenMann2014pub.pdf. [Accessed: 02-Sep-2018]
[83] Steve Mann, Ming-Chang Tsai et al, “Effectiveness of integral kinesiology feedback for fitness based games”. IEEE GEM. 2018. [Online]. Available: https://www.researchgate.net/publication/327076095_ Effectiveness_of_Integral_Kinesiology_Feedback_for_Fitness-based_Games. [Accessed: 02-Sep-2018]
[84] Dale Purves et al. “Anatomical distribution of rods and cones”. 2001. NCBI Publications. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK10848/ . [Accessed: 02-Sep-2018]
[85] Brian Ventrudo. “Averted vision”. One-minute Astronomer. 2008. [Online]. Available: https://oneminuteastronomer.com/86/averted-vision/ . [Accessed: 02-Sep-2018]
113
[86] Ryan Janzen, Steve Mann, “Sensory Flux from the Eye: Biological Sensing-of-Sensing (Veillametrics) for 3D Augmented-Reality Environments”, IEEE GEM. 2015. [Online]. Available: http://eyetap.org/docs/BioVeillametrics_JanzenMann2015.pdf. [Accessed: 02-Sep-2018]
[87] Ole Baunbæk Jensen. “Webcam-Based Eye Tracking vs. an Eye Tracker”. IMotions. 2017. [Online]. Available: https://imotions.com/blog/webcam-eye-tracking-vs-an-eye-tracker/ . [Accessed: 02-Sep-2018]
[88] Wikipedia Commons. “Eye tracking”. Wikipedia Commons. 2018. [Online]. Available: https://en.wikipedia.org/wiki/Eye_tracking. [Accessed: 02-Sep-2018]
[89] Britannica. “Electroencephalography”. Britannica. 2018. [Online]. Available: https://www.britannica.com/science/electroencephalography. [Accessed: 02-Sep-2018]
[90] Tobii Pro. “Dark and light pupil tracking”. Tobii Pro. 2018. [Online]. Available: https://www.tobiipro.com/ learn-and-support/learn/eye-tracking-essentials/what-is-dark-and-bright-pupil-tracking/ . [Accessed: 02-Sep-2018]
[91] Carlos Hitoshi Morimoto. “Pupil detection and tracking using multiple light sources”. Semantic Scholar. 2000. [Online]. Available: https://www.semanticscholar.org/paper/Pupil-detection-and-tracking-using- multiple-light-Morimoto-Koons/66d35a96866cb166d2a14def392ce4aa602595a9 . [Accessed: 02-Sep-2018]
[92] Chi Jian-nan et al. “Key Techniques of Eye Gaze Tracking Based on Pupil Corneal Reflection”. IEEE Xplore Digital Library. 2009. [Online]. Available: https://ieeexplore.ieee.org/document/5209377/. [Accessed: 02-Sep-2018]
[93] Wikipedia Commons. “Tapetum Lucidum”. Wikipedia Commons. 2018. [Online]. Available: https://en.wikipedia.org/wiki/Tapetum_lucidum. [Accessed: 02-Sep-2018]
[94] Wikipedia Commons. “Retroreflector”. Wikipedia Commons. 2018. [Online]. Available: https://en.wikipedia.org/wiki/Retroreflector . [Accessed: 02-Sep-2018]
[95] Jason S.Babcock & Jeff B. Pelz. “Building a lightweight eyetracking headgear”. Rochester Institute of Technology Publications. 2004. [Online]. Available: http://www.cis.rit.edu/pelz/publications/ETRA04_babcock_pelz.pdf . [Accessed: 02-Sep-2018]
[96] Renesas. “Eye Safety for Proximity Sensing Using Infrared Light-emitting Diodes”. Renesas. 2016. [Online]. Available: https://www.intersil.com/content/dam/Intersil/documents/an17/an1737.pdf. [Accessed: 02-Sep-2018]
[97] OpenCV Documentations. “Rodrigues”. OpenCV Documentations. 2018. [Online]. Available: https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#rodrigues. [Accessed: 02-Sep-2018]
[98] OGL Dev. “Camera space”. OGL Dev. 2018. [Online]. Available: http://ogldev.atspace.co.uk/www/tutorial13/tutorial13.html. [Accessed: 02-Sep-2018]
[99] Steve Mann. “EyeTap: The eye itself as display and camera”. Eyetap. 2018. [Online]. Available: http://www.eyetap.org/research/eyetap.html . [Accessed: 02-Sep-2018]
[100] Hao Lv, Sen Yang et al. “Open Source EyeTap: Empowering Every Maker with Phenomenal Augmented Reality and Wearable Computing”. ISWC. 2017. [Online]. Available: http://www.eyetap.org/papers/docs/OpenEyeTap_ISWC2017.pdf . [Accessed: 02-Sep-2018]
[101] OpenEyetap. “Main page”. OpenEyetap. 2018. [Online] Available: https://www.openeyetap.com/. [Accessed: 02-Sep-2018]
114