A General Network for Managing and Processing Live Video Data With Privacy Protection
-
Upload
ardishahid -
Category
Documents
-
view
217 -
download
0
Transcript of A General Network for Managing and Processing Live Video Data With Privacy Protection
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
1/21
R E G U L A R P A P E R
A general framework for managing and processing live video datawith privacy protection
Alexander J. Aved Kien A. Hua
Springer-Verlag 2011
Abstract Though a large body of existing work on video
surveillance focuses on image and video processing tech-niques, few address the usability of such systems, and in
particular privacy issues. This study fuses concepts from
stream processing and content-based image retrieval to
construct a privacy-preserving framework for rapid devel-
opment and deployment of video surveillance applications.
Privacy policies, instantiated to as privacy filters, may be
applied both granularly and hierarchically. Privacy filters
are granular as they are applicable to specific objects
appearing in the video streams. They are hierarchal because
they can be specified at specific objects in the framework
(e.g., users, cameras) and are combined such that the dis-
seminated video stream adheres to the most stringent aspect
specified in the cascade of all privacy filters relevant to a
video stream or query. To support this privacy framework,
we extend our Live Video Database Model with an infor-
matics-based approach to object recognition and tracking
and add an intrinsic privacy model that provides a level of
privacy protection not previously available for real-time
streaming video data. The proposed framework also pro-
vides a formal approach to implement and enforce privacy
policies that are verifiable, an important step towards pri-
vacy certification of video surveillance systems through a
standardized privacy specification language.
Keywords Query language Privacy framework Video database system Real-time Object recognition Object tracking
1 Introduction
Camera networks have been the subjects of intensive
research in recent years, and can range from a single
camera to a network of tens of thousands of cameras.
Usability is an important contributing factor to the effec-
tiveness of such networks. As an example, a camera net-
work in London costs 200 million over a 10-year period.
However, police are no more likely to catch offenders in
areas with hundreds of cameras than in those with hardly
any [29]. This phenomenon is typical for large-scale
applications and can be attributed to the fact that the
multitude of videos is generally not interesting and con-
stant manual monitoring of these cameras for occasional
critical events can become fatiguing. Due to this limitation
in real-time surveillance capability, cities mainly use video
networks to archive data for post-crime investigation.
To address operator fatigue and increase the effective-
ness of the camera network, automatic video processing
techniques have been developed for real-time event
detection under various specific scenarios. These systems,
such as the live video database discussed in this study, can
monitor live video streams in real time for events of
interest and can alert human operators upon their detection.
However, pervasive monitoring by corporate and govern-
mental entities can lead to privacy concerns. For example,
archived video from a police camera network could later be
used for purposes other than what for which it was origi-
nally collected. Public information laws could make video
footage collected for legitimate purposes available to
anyone who requests it. In a corporate setting, cameras
deployed to record customer behavior might capture
employees after their work shift has ended.
Deploying a sizable camera network entails a significant
monetary investment. For aesthetic reasons it is also
A. J. Aved (&) K. A. HuaUniversity of Central Florida, Orlando, FL, USA
e-mail: [email protected]
K. A. Hua
e-mail: [email protected]
123
Multimedia Systems
DOI 10.1007/s00530-011-0245-x
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
2/21
desirable to minimize the number of cameras deployed
(e.g., in a historic district of a town). For reasons such as
these, it is would be beneficial if multiple entities could
share access to the cameras. Police could monitor for
suspicious activity and crime scene evidence, utility com-
panies could gauge outages after inclement weather, and
sanitation departments could assess the productivity of new
employee vehicle operators, to name just a few possiblecollaborators. Possible benefits entail shared deployment
costs, and the possibility of providing service to stake-
holders who otherwise could not justify the expense of a
single-purpose camera network deployment on their own.
However, shared usage and monitoring by indeterminate
or changing interests could lead to significant privacy
concerns. Thus, to address the usability and privacy con-
cerns of a general-purpose camera network, three factors
are important:
1. The software system must support ad hoc monitoring
tasks. An event of interest in one domain such astransportation monitoring is generally different from
another domain such as crime prevention. Events of
interest can also vary significantly between individual
users.
2. It is desirable to provide the capability to enable rapid
development of customized applications for domain-
specific users, such as counting cars along a section of
highway, or informing an employee when a nearby
conference room becomes available early.
3. People are concerned about privacy and there are
increasing objections to pervasive monitoring. For
applications that run atop a camera network, there is aneed for policies which specify the level of privacy
they adhere to, and mechanisms which implement said
policies.
To achieve the first two factors, we have designed and
implemented a general-purposed Live Video Database
Management System (LVDBMS) [25] as a platform to
facilitate live video computing. It allows automatic moni-
toring and management of a large number of live cameras.
These cameras in the network are treated as a special class of
storage with the live video streams viewed as database
content. The user is able to specify an ad hoc video moni-
toring task by formulating a query to describe a spatiotem-
poral event. When the event occurs and is detected by a
monitoring query, an action associated with the query is
executed. This general-purposed LVDBMS also enables
rapid development of live video applications much like
database applications are developed atop standard database
management systems today. Another work that allows the
user to specify semantically high-level composite events for
video surveillance is presented in [31]. However, this tech-
nique requires the user to formulate queries procedurally
using low-level operators. In contrast, our query language is
declarative. The user only defines the event and the system
automatically generates the corresponding query processing
procedure.
To address the third factor mentioned previously, we
present in this article a privacy framework for the
LVDBMS. This framework implements a privacy speci-
fication language (PSL) that permits privacy policies to bespecified, and enforced by removing identifiable infor-
mation pertaining to objects from the video streams as
they are made available externally by the system.
Example consumers of streaming video could be a file for
storage, an operators video terminal, or a live video feed
captured for use in a traffic report on a television news
broadcast. This facility allows the user to specify various
privacy views atop the raw video stream to remove the
objects from the output stream, which has the aim of
protecting individual privacy while retaining general
trends evidenced in the video stream such that further
scene analysis is meaningful. Here objects refer to aperson, animal, or vehicle that is not part of the video
background. These objects are characterized using a
multifaceted object model. As part of the new extension
to our prototype, we also introduce in this article an
informatics-based approach to cross-camera tracking
technique. This scheme permits queries to be defined that
span multiple video streams.
The remainder of this article is organized as follows. In
Sect. 2, we give an overview of our LVDBMS to make the
article self-contained. The proposed privacy framework is
introduced in Sect. 3, with experimental results in Sect. 4.
Related work is discussed in Sect. 5. Finally, we conclude
this article in Sect. 6.
2 LVDBMS environment
In this section, we briefly introduce LVDBMS, a distrib-
uted video stream processing Live Video Database (LVD)
environment, and refer the reader to [25] for further details.
Large networks of cameras provide a proliferation of
multimedia information, and there is a profound need to
manage and organize the volumes of video data into pro-
portions relevant for human consumption. In an environ-
ment with numerous video cameras, the goal is to provide
human operators with a facility to specify relevant scenes
of interest, minimizing exposure to uninteresting and
irrelevant scenes. However, today there is a technological
gap between what state of the art software technologies can
provide in terms of identifying rich content, and consumer
expectations. The LVDBMS allows users to mine numer-
ous video streams for events of interest and perform actions
when the specified scenarios are encountered.
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
3/21
2.1 LVDBMS architecture
The LVDBMS is a distributed video database management
system. Operators interact with the highest layer, and view
video streams and query results through computer terminals
(Fig. 1). Figure 2 illustrates the fourlogicaltiers implemented
in the LVDBMS architecture which communicates through
web services interfaces. Multiple cameras in the camera layer
may be associated with a single host in the spatial processing
layer. Spatial processing layer hosts perform real-time motion
imagery processing techniques for abstract object recognition
and partial query evaluation (dependent upon available data at
the host). Intermediate query results are streamed to a host in
the stream processing layer which periodically computes the
final query result which is made available to clients in the
client layer. We discuss these softwarelayers in more detail in
the following sections.
2.1.1 Camera layer
The camera layer, Fig. 3 left, consists of physical devices
that capture images. Each camera is paired with a camera
adapter which runs on a host computer. We do not assume
that the cameras have built-in analytical capability other
than capturing images and making them available to a
corresponding camera adapter.
The camera adapter allows any type of camera to be used
with the LVDBMS using relevant drivers. When processing
a scene, the camera adapter first performs scene analysis on
the raw image data to determine background pixels and
foreground objects. The segmented objects then flow into
the frame-to-frame tracking module, which tracks objects
within consecutive frames in a cameras field of view and
assigns them an object number (unique within the camera
adapter). For each image a bag of feature vectors is calcu-
lated (a bag is similar to a set but allows for duplication).
Once each frame from the camera is processed, it is
bundled as an image descriptor and sent to the spatial pro-
cessing layer for query evaluation. The image descriptor
may contain the actual image bitmap if specifically
requested by the spatial processing layer host, but otherwise
it contains only image meta data, such as the identifiers and
locations of objects identified within the frame, the corre-
sponding feature vectors, frame sequence number, etc.
2.1.2 Spatial processing layer
Spatial processing layer hosts evaluate spatial and temporal
operators over the streams of image descriptors and provide
Fig. 1 LVDBMS hardware
architecture
Fig. 2 The LVDBMS is logically grouped into three layers plus a
client layer
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
4/21
a result stream to the stream processing layer (Fig. 3 mid-
dle). A server hosting the spatial processing layer will ser-
vice many cameras, but a camera adapter will be associated
with only a single instance of the spatial processing layer.
Server replication at this layer allows the LVDBMS to scale
to an arbitrarily large number of video streams.
2.1.3 Stream processing layer
The stream processing layer (Fig. 3 right) accepts queries
submitted by clients and partial query evaluation results
from the spatial processing layer. It does not interact
directly with cameras or their adapters. We note that we
can have replication at the stream processing layer for and
fault-tolerance. Queries are decomposed into an algebraic
tree structure which is partitioned by the host and pushed
down to relevant servers in the spatial processing layer. As
sub queries are evaluated, results are streamed back to the
Stream Processing Layer where they are combined and the
final query result computed. Sub query results may arrive
out of order, get lost in the network, or a camera may
unexpectedly go offline and must be gracefully handled.
2.1.4 Client layer
Users connect to the LVDBMS and submit queries using a
graphical user interface, depicted in Fig. 4. The client
allows users to browse available video cameras, and define
and submit queries and review results.
2.2 LVDBMS data model
A query is a spatiotemporal specification of an event, posed
over video streams, and expressed in Live Video SQL
Fig. 3 Software layers of the LVDBMS and major components contained therein
Fig. 4 The LVDBMS client
allows users to browse cameras,
construct queries, and send
system commands
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
5/21
(LVSQL), a structured query language. A query defines
which streams will be accessed and what information will
be returned. In an LVD, information is contained in live
video streams which are inputted to the LVDBMS in real
time. The fundamental construct in an LVD is an object,
which is either indicated by the user (a static object) or
automatically detected (dynamic object).
We provide a brief description of the LVD data modeland refer the reader to [25] for additional details. A video
stream consists of temporally ordered frames where each
frame represents a snapshot of what was detected by an
image sensor at a particular time. An object, then, is some
real-world physical entity whose image was captured and is
represented in the frame. (For our definition of object, we
do not consider the background to be an object it is spe-
cifically indicated in that query.) There are two types of
operators in LVDs: spatial and temporal. Spatial operators
are formulated over objects that are visually captured in
video streams (i.e., overlaps, meets, disjoint, exists, etc.).
Temporal operators evaluate the temporal relationshipsbetween spatial events (i.e., before, during, etc.). When
constructing LVSQL statements, there are three types of
objects that may be referenced:
1. Static objects are indicated by the user by drawing an
outline on a frame captured from the video stream, at
query submission time.
2. Dynamic objects the objects that appear in a video
stream and are not part of the background. They may
be specified as an asterisk (*) in the query.
3. Cross-camera dynamic objects dynamic objects
detected in a camera and matched with an object thatwas (or is) viewed in another camera. In the query
language, these objects are denoted with a pound sign
(#), e.g., Before (Appear(V1.#), Appear(V2.#), 120),
which queries for an object that appears in stream V2
within 120 s of appearing in V1.
As events occur in real time, queries must be resolved in
real time as well. Queries that require temporarily storing
historical video data are always parameterized such that the
data that must be retained for query evaluation is always
contained to a temporal sliding window. The resolution of a
query refers to the frequency with which a query is evaluated.
For example, a query that is evaluated five times each second
is of finer resolution than a query evaluated each second.
2.3 LVDBMS query language
The LVSQL query language is used to pose continuous
queries over events that occur in live video streams in real
time. The essential form of an LVSQL query is as follows:
ACTION \Action[
ON EVENT \Event[
where\Event[ syntax is expanded upon in Fig. 5. In
LVSQL, spatial operators take objects as arguments, and
temporal operators take output of spatial operators (i.e., bit
streams) as arguments. Boolean logic combined with the
various temporal and spatial operators results in a very
expressive language capability.
All queries must involve a spatial operator; the simplest
query expressible could check for the existence of any
object appearing in the field of view of a particular camera.
This spatial query could then be enhanced with a temporal
component, for example, duration: trigger an alarm if an
object appears and persists for longer than 10 min; or two
spatial operators could be combined with a temporal
operator: alert if an object contacts a particular desk, then
walks through a door.
3 Privacy filter framework
In this section, we introduce our privacy framework and
explain how it is applied. The implementation details are
discussed in Sect. 4. While members of the public gener-
ally accept having their image recorded by cameras, it is a
violation of their trust to use their data for purposes people
may find intrusive, or to have their image used for reasons
contrary to the known usage of the cameras. Examples of
Fig. 5 LVSQL event
specification syntax
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
6/21
intrusive uses could be a security guard observing shoppers
for personal edification during a rest period, or corporate
mining of the behavior of individuals in a store so they may
have items marketed to them the next time they enter that
store, etc. As cameras become ubiquitous in public loca-
tions, camera networks become smarter and storage
capacity increases such that larger volumes of video data
may be retained for lengthier periods of time, it is anincreasing concern that video data collected for one pur-
pose may be used for other purposes. If an intrusive usage
was known to the target individual, they might have chosen
to not participate, by shopping in a competing mall. In this
article, we introduce privacy filters as the framework which
could anonymize the people who are observed by net-
worked video cameras in accordance with a privacy policy.
Depending upon the specifics of the privacy policies being
enforced, global trends in the videos could still be visible,
such as people going into and out of a room, while the
appearances of the individuals could be redacted, thus
minimizing the potential for misuse of the video andunintended consequences for the people being observed.
We endeavor to protect the identity of innocent indi-
viduals. However, some users need the option of investi-
gating and identifying individuals (if they have the
authority to do so). In this scenario, the LVDBMS is
designed to allow someone with the proper access to
investigate the identity of individuals via unperturbed
video streams, while applying privacy filters to protect
individual privacy by blocking video stream consumers
without sufficient access privileges. Thus, privacy
enforcement does not affect the intended utility or intelli-
gibility of the video. The challenge is to accomplish this in
real time without restricting who consumes the stream so
that actions triggered by events occurring in the stream are
timely and relevant.
3.1 Scope and assumptions
This section describes the objectives of the transformation
that privacy filters induce upon corresponding video
streams. It also defines the scope of implementation
assumptions inherent to our LVDBMS prototype.
3.1.1 Scope of privacy applied to video stream output
The proposed privacy framework is based on the concept
of privacy filters. A privacy filter implements a privacy
policy, which specifies under what circumstances the
appearance of objects appearing in video streams passing
through the filter may be observed or must be redacted
from the stream. The primary goal of privacy filters is to
obfuscate the appearance of qualifying objects such that
objects will become unidentifiable after passing through
the privacy filter. More precisely, a privacy filter defines a
set of criteria. This criteria is matched with objects that are
identified in video streams. This criteria can be specified to
be very precise (e.g., objects that satisfy a particular query
condition based upon temporal and spatial location), or
general (e.g., applied to all objects observed by a particular
camera). Thus, the scope of privacy filters are the salient
objects appearing in the video stream; not the environment,such as the scene background observed by a camera, or
other conditions that can be observed such as time of day,
conjectures based upon knowledge of the location of the
camera from which the video stream originates, etc. Fur-
thermore, privacy filters are applied to video streams as a
final step before the streamed data is externalized from the
system. In order to maximize query accuracy, queries and
internal indices for object tracking are based upon metrics
calculated from non-obfuscated data. This raw metadata is
never externalized by the LVDBMS or explicitly saved to
persistent storage.
3.1.2 Scope of system prototype implementation
The focus of this research is privacy policies, the realiza-
tion of those privacy policies as privacy filters, and the
corresponding transformations privacy filters have upon
corresponding video streams. We also do not consider as a
part of this work aspects of system security that must be
considered in an actual physical deployment to a public
area. For example, we do not consider the physical aspects
of the system, such as the physical security of servers
hosting LVDBMS software, cameras and the communica-
tions channels between cameras and LVDBMS hosts.
However, we note that such things can be accomplished by
other means such as purchasing fixtures to hold the cam-
eras and enabling encrypting communication tunnels via
the operating system or through virtual private networks,
etc. Furthermore, we do not attempt to detect and thwart
privacy attacks against the system, such as through a series
of specifically crafted queries issued by a user designed to
leak unintended information. Although we do implement
certain safeguards, such as providing a mechanism to
restrict which cameras and video streams a user can
observe, we assume a user is who she presents herself to
be, and not a malicious user masquerading as a legitimate
system user.
3.2 Framework overview
Privacy filters may be applied at different levels in the
LVDMBS system hierarchy, and video streams may be
affected by multiple privacy filters. This cascade of pri-
vacy filters is conceptually similar to how views may
restrict columns in a traditional relational database (Fig. 6
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
7/21
left). When a video stream passes through multiple pri-
vacy filters, the effect is that the most stringent privacy
level is applied (note that a privacy filter does not nec-
essarily apply to each object appearing in a particular
frame of video in a passing stream; Fig. 6 right). Simi-
larly, in a relational database a user may be allowed to
only access views, and relational views may be built upon
other views, which may themselves reference the physicaltables or yet other views.
In the LVDBMS hierarchy, privacy filters are associated
with cameras, queries, user groups, and view objects:
Camera: Any camera with the system can have a
privacy filter associated with it. Applied at this level,
the privacy filter has the broadest impact as it affects all
consumers of this camera.
Query: A privacy filter at this level has a moderate
impact, as it is associated with a specific query. It
affects only the consumers of the querys output.
User Group: A privacy filter at this level has thenarrowest impact. Only the users in this group are
affected.
View: A view, implementing a privacy filter, may be
defined over a stream or a previously-defined view.
Queries and users may access the underlying video
stream through the view with the constraint that the
privacy filter will be applied to the views output.
3.3 Filter output sanitation model
While the previous section provided a conceptual definition
of privacy filters, for clarity this section gives a more
precise treatment. Let Q be the set of active queries posed
over streams S in the LVDBMS. A stream S 2 S in the
stream processing layer is a first in first out (FIFO)
sequence of frames S ffi;fi1; . . .;fik1g where k is thelesser of the maximum number of frames required to
resolve any active query q 2 Q and a system definedmaximum. (Note that frame fi 2 S for any S represents themost recent image captured by a camera, after a negligible
processing and communication delay, and S is maintained
in real time as frames are received from spatial processinglayer hosts.)
Frames in S are retrieved via a frame access function:
AccS; k ! frame
which retrieves the frame and corresponding metadata in
the kth position in S. When a stream is externalized fromthe LVDBMS (such as for display on a users terminal,
saved in a file, etc.), it is passed through a sanitizer
function:
San S;f ! Acc S; 1 Zf
where Z returns a mask that indicates which regions of theframe to obscure in accordance with the privacy filter fand
obfuscates the image bitmap contained in the frame withthe mask, perturbing the output of San. When selecting
from a view with a privacy filter f0 San becomes:
San S;f ! Acc S; 1 Zf f0
where * combines the filters as described in the previous
section. In literature, some sanitizers chose to not answer
queries or add noise using a statistical distribution. How-
ever, Z is deterministic with its parameter f. Furthermore,
detecting an attack (that is, a determination of information
the system attempts to redact with the mask) is beyond the
scope of this study, and all queries are assumed to be
legitimate.
Fig. 6 Relational database
view (left) compared with
illustration of a cascade of
privacy filters (right)
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
8/21
3.4 PSL
The PSL allows a system administrator to implement pri-
vacy policies by constructing privacy filters, and to manage
system user access. Privacy filters can be associated withgroups, and thus with individual users through their user-
group membership association. When applied to a group, a
privacy filter affects all people in the group. Privacy filters
may also be associated with views, in which case they
apply to any accessors of the view. (Privacy filters may also
be associated with cameras, but this is specified in the
configuration file associated with the cameras adapter.) All
privacy filters are cumulative; the system does not provide
a way to reduce privacy by adding a new privacy filter.
When a user creates a query, their privacy filters (via
group memberships) are in turn associated with the query.
However, a system administrator to create canned que-ries which users may run unaffected by the privacy filters
associated with the executors account. For example, such
a query could save an unperturbed video stream to a per-
sistent storage location the user does not have access to, if
they feel a crime is occurring and observed in a camera that
they do not have unrestricted access to. The syntax for the
PSL is provided in Fig. 7.
3.5 Design and implementation of privacy filters
This section presents operational and implementation
details of privacy filters in the LVDBMS. When two ormore privacy filters apply to a stream, they are combined
into an effective privacy filter.
An active query is a currently running query in the
LVDBMS system. Each query has operators that specify
object(s) from cameras (i.e., spatial operators) as input. A
relevant object is an object that appears in a video stream
referenced in an active query, and can potentially con-
tribute to the query evaluating to true. If the query becomes
true, the contributing relevant object is called a target
object; otherwise, it is a non-target object. Consider the
query Contains (C1.S1,C1.*) illustrated in Fig. 8. This
query determines if there is any dynamic object detected
within the static object S1 represented by the dashed rect-
angle. Since the dynamic object D121 is contained within
S1, D121 is a target object. On the other hand, another
dynamic object D102 is not contained in S1, D102 is a non-
target object. A privacy filter can be specified so that only
target objects, non-target objects, or all relevant objects
should be protected (i.e., have their appearance obscured).If a protected object no longer satisfies a privacy filter
specification, this object obtains the status previously
masked. In this example, if the privacy filter is to blur all
the target objects, then the dynamic object D121 is a pre-
viously masked object after it leaves the boundary of S1.
We note that displaying a live video stream from a given
camera is a similar to a query which always evaluates to
true, and the output video stream is the same as the input
video stream.
{CREATE | UPDATE | DELETE} FILTERfilter_identifier
[TARGET = {QUERYTARGETS | NONQUERYTARGETS | PREVIOUSLYMASKED}]
[TEMPORALSCOPE = {QUERYNONACTIVE | QUERYACTIVE | PERMANENT}]
[OBJECTSCOPE = {STATIC | DYNAMIC | CROSSCAMERADYNAMIC}]
{CREATE | UPDATE | DELETE} VIEW view_identifierOVER stream_identifier
[WITHfilter_identifier]
{ASSOCIATE | DISASSOCIATE} GROUP group_identifierWITH
{FILTER | VIEW}filter_identifier
{CREATE | DELETE} USERGROUP group_identifier
{ASSOCIATE | DISASSOCIATE} USER user_identifierWITH group_identifier
Fig. 7 Privacy Specification
Language; uppercase represents
a keyword and italics a user-
supplied parameter
Fig. 8 QueryTarget versus NonQueryTargets: D121 satisfies the
query condition and is a target
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
9/21
3.6 Defining privacy filters
Privacy filters are created in the LVDBMS in one of two
ways: they are specified in configuration files at system
startup, or are created with the create filter PSL statement.
A privacy filter is specified by the attribute 3-tuple: {Tar-
get, TemporalScope, ObjectScope} where Target, Tempo-
ralScope, and ObjectScope are quantified in Tables 1, 2,and 3. Not all attributes are applicable to every use of
privacy filters, and may be set to None. The Target
parameter specifies if the filter applies to only target
objects, non-target objects, all relevant objects defined over
the camera (static objects defined by users and dynamic
objects detected in the video stream), or None indicating
that this attribute should not be considered when deter-
mining if a privacy filter applies to a particular object.
ObjectScope refers to the type (scope) of objects the filter
applies to; static or dynamic objects, cross-camera
dynamic, none or all objects appearing in the stream.
TemporalScope indicates when the filter will be applied;always, never (the filter is currently inactive), or when the
stream is or is not being accessed by a query.
3.7 Privacy filters applied to cameras
Logically, we may associate privacy filters with physical
cameras, but to make the LVDBMS software more flexible
with respect to the types cameras that can be used with the
system, privacy filters are actually evaluated by the camera
adapter.
When the camera adapter is initialized, it takes its initial
configuration from a configuration file. The initial state of
its privacy filter can be specified in this file and will persist
for the lifetime of the adapters state. (When the LVDBMS
is in operation, an operator may specify new default pri-vacy filter settings.) A cameras privacy filter is maintained
in the camera adapter.
Example 1 if a camera has a privacy filter with an
attribute set to None, then by itself, that filter will have no
effect. However, when combined with the privacy filter of
an active query, it can combine to elevate the privacy state.
For example, given a camera with a default privacy filter of
{All, None, None} will not result in any effective privacy
state when a query is evaluated with its images. However,
given a query with privacy filter {QueryTargets, Query-
Active, Dynamic} will evaluate to the effective privacy
state {All, QueryActive, Dynamic}. The difference is that
all dynamic objects will be obscured, instead of only ones
that are query targets.
The attribute values of a privacy filter apply at the
camera level as follows. The Target parameter specifies
whether it applies to objects that are the target of active
queries (QueryTargets), objects that are not targets of
active queries (NonQueryTargets), all objects defined over
the camera (static, dynamic and cross-camera dynamic)
(All), or no objects (None). PreviouslyMasked refers to
objects that previously qualified to be included in a Privacy
Filter. (i.e., they were a non-query target in the camera with
a NonQueryTargets attribute set, were a query target with
the QueryTargets attribute, etc.) We note that from the
perspective of a camera, an active query is a query that has
(1) been issued to the LVDBMS system; (2) is not expired;
(3) has not evaluated to a condition that executes an action
that causes the query to terminate; and (4) the query has an
operator that specifies as input object(s) from said camera.
A query target is an object that appears in the field of
view of a camera, and satisfies two conditions: (1) it is a
static object, dynamic object, or cross-camera dynamic
Table 1 Privacy filter values for parameter type Target
Attribute Description Priority
None No privacy 1
QueryTargets Targets of active queries are obscured 2
NonQueryTargets Objects that are not targets of active
queries are masked. An active query
may obscure their identity
2
PreviouslyMasked Specifies that objects that were
previously masked will continue to
be masked
2
All All object identities are masked,
regardless of query status
3
Table 2 Privacy filter values for parameter type TemporalScope
Attribute Description Priority
None No privacy 1
QueryNonActive Privacy settings apply only when a
query is not active
2
QueryActive Privacy settings apply only when a
privacy-enabled query is active (in the
case of privacy applied to a camera,
for example)
2
Permanent Privacy settings apply for the lifetime of
the object or camera or query
3
Table 3 Privacy filter values for parameter type ObjectScope
Attribute Description Priority
None No privacy (no relevant objects
qualify)
1
CrossCameraDynamic Objects that are first detected in
another camera
2
Dynamic Dynamic (automatically detected)objects 2
Static Static (user defined) objects
All All classes of objects qualify 3
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
10/21
object associated with said camera, and (2) it is referenced
(as an operand; directly in the case of a static object or
indirectly in the other two cases) by an operand in an active
query over the camera in which it appears. In Fig. 8,
Dynamic object D121 is contained within an active query
and is a QueryTarget. D102 is not associated with a query
and has the status NonQueryTargets.
The ObjectScope Privacy Filter attributes are the dif-ferent classes of objects identified by the LVDBMS and
explained previously. The TemporalScope attribute of a
Privacy Filter applies at the camera level as either None or
Permanent; we currently do not support a more granular
temporal operator.
3.8 Privacy filters applied to queries
Privacy filters may also be associated with queries. Default
privacy filters may be configured at the system level (in the
configuration corresponding to the Stream Processing
Layer host), or it may be specified when the query isinstantiated. Once a query is associated with a privacy
filter, that privacy filter is retained with the query for the
lifetime of that query. A querys privacy filter is kept in the
stream processing layer Query Executive along with other
query metadata, such as the number of sub-queries a query
has been decomposed into, which stream processing layer
hosts the query has been sent to, etc.
When applied to a query, the TemporalScope attributes
(Table 2) have only two distinct behaviors; None and
Permanent (QueryActive is treated equivalent to Perma-
nent, since both refer to the lifespan of the query).
Example 2 a Department of Transportation (DOT)
Traffic Management Center (TMC) makes available live
video feeds from cameras mounted along major roadways
for broadcast on nightly news television news segments.
The TMC provides these video feeds to allow the public to
observe traffic pattern trends in real time (such as how
quickly traffic is moving across a particular bridge), and for
news broadcasters to announce traffic incidents causing
lane blockages. However, TMC personnel do not want to
embarrass individuals involved in specific traffic incidents,
or to broadcast identifiers such as license plate numbers.
The TMC objective is to provide real-time video with ablur applied to all objects. Object types (e.g., car, truck, and
pedestrian) and color can be distinguished by observers,
but not individual identifiers such as faces and license plate
numbers (Fig. 9). (Note that only TMC personnel can
control camera functions such as zoom.) This is accom-
plished by creating a query with an Appear() operator with
corresponding privacy filter {None, None, All}. A privacy
filter must be associated with the query because by default
the query would run with the privacy filter of the user who
created the query. In the case of a TMC operator who does
not have her view restricted, the default privacy filter for
query from a TMC operator would not have a privacy filterapplied to it.
3.9 Privacy filters applied to users
Default privacy filters may also be defined that apply to
users of the system who connect through the client GUI
and can browse video cameras. When a users GUI con-
nects to the Stream Processing Layer host, the connection
is registered with the Session Manager which records
connectivity information such as client IP and port, starts a
heartbeat service, and associates a privacy filter with the
registration if necessary. The heartbeat service keeps track
of clients who are connected to the system and deallocates
resources for clients who disconnect. Clients can run que-
ries which do not have actions specified, but continuously
return evaluation results to the GUI for the user to watch.
Such queries will be aborted if the corresponding client
remains disconnected for a period of time. When applied to
a user, the TemporalScope attributes (Table 2) have only
two effective behaviors; None and Permanent (where
Permanent refers to the time the client is connected to the
LVDBMS). The other attributes behave as described
previously.
Example 3 a security monitoring application is written
for the LVDBMS. It has a predefined set of queries the
security guards, can select from, and view on their GUI.
They can also monitor any camera associated with the
system, but only see the identities of people who satisfy the
query conditions (for example, someone who has been
standing in the same place for more than 5 min). If a
security guard watches that video feed, the person who has
not moved will be unobstructed, but people walking nearby
will have their image masked. After 5 min, a query actionFig. 9 The two dynamic objects depicted here have their details
obscured by the privacy filter
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
11/21
is triggered that records the video, this query is not asso-
ciated with the security guards privacy filter and records
the entire camera view without obstructions.
3.10 Combining privacy filters
Privacy filters associated with cameras, users, and queries
must all be combined to determine which objects in whichframes of the video will have their identity obscured when
it is viewed by a user or saved to a persistent file. When a
user requests to view video from a camera, the users pri-
vacy filter is sent to the corresponding camera adapter. It is
combined with the cameras and views privacy filters (if
applicable) and then the users GUI connects directly to the
camera adapter to receive live video. In the case of a query,
the cameras privacy filter is sent as metadata along with
the image descriptor, to the corresponding spatial pro-
cessing layer host. If a query action requires a live video
stream (e.g., to save it to disk or direct it to a video
monitor), then the querys privacy filter will be pusheddown to the spatial processing layer host which is con-
nected to the camera adapter.
When two objects that have active privacy filters inter-
act, the effective privacy settings must be calculated. As we
have discussed, when privacy filters interact at multiple
levels, the effective privacy filter must be calculated. Each
attribute in the privacy filter 3-tuple has a value which is
assigned a priority (or no value is specified in which case
that attribute is not factored into the privacy calculation).
When combining attributes, the highest priority attribute is
taken, where a higher priority corresponds to more object
observations being redacted from the output video stream.
Attribute priorities are specified in the Priority column in
Tables 1, 2, and 3. When combining privacy filters, if they
have different attribute values with the same priority for the
same attribute, then the effective attribute value chosen is
the value at the next higher priority for that attribute. This
procedure must be repeated each time a new query becomes
associated with a camera object, or a query expires.
Example 4 given a camera object with privacy filter
{NonQueryTargets, QueryActive, CrossCameraDynamic}
and a query object associated with the camera {Query-
Targets, QueryActive, Dynamic}, the effective privacy
filter will be {All, Permanent, All}. That is, the priorities
are {2,2,2} and {2,2,2}. Where the tuples have different
attribute values that are of equal priority, this is reconciled
by giving the attribute the value of the next higher priority
parameter.
3.11 Tracking based upon a multifaceted object model
In order to implement the cross-camera dynamic object
operand in LVSQL, we developed a camera-to-camera
tracking technique based upon constructing an appearance
model of the objects appearing in video streams. Objects
are tracked from frame-to-frame using a traditional track-
ing technique (e.g., [32]), which we refer to as a frame-to-
frame tracker since it tracks objects within a single video
stream. When an object appears in a consecutive sequence
of frames, the frame-to-frame tracker assigns a unique
identifier to the object as a part of the tracking process. Afeature vector based upon the objects appearance is also
calculated. An object is represented as a bag of multiple
instances [6, 7], where each instance is a feature vector
based upon an objects visual appearance at a point in the
video stream. Therefore, an object can be viewed as a set of
points in the multidimensional feature space, referred to as
a point set (Fig. 10). Note that the k instances in a bag are
derived from k samplings of the object, which may not
necessarily be taken from consecutive frames.
A FIFO database to hold the multiple-instance bags of
objects recently detected by the different cameras in the
system. As new observation becomes available, the bagitself is updated by adding the new instance and removing
the oldest instance. As surveillance queries generally
concern real-time events that have occurred recently, the
FIFO database is typically very small, and in our prototype
we implemented it as a distributed in-memory database
system (distributed among spatial processing layer hosts).
Cross-camera tracking is performed as follows. When an
object is detected by a camera, its multiple-instance bag is
extracted from consecutive frames in the video stream and
used as an example to retrieve a sufficiently similar bag in
the distributed object-tracking database. If there exists
another bag sufficiently close, based upon the squared
distance metric, then the two bags are considered to cor-
respond to the same object. On the other hand, if the system
does not find a sufficiently similar bag, the occurrence of
Fig. 10 Multifaceted object representation model in which an object
is represented by its point set (i.e., feature vectors)
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
12/21
this newly detected object is considered as the objects first
appearance in the system.
To support the retrieval operations, the distributed in-
memory database needs to compute similarity between
bags. Given two bags of multiple instances:
X x!1; x!
2; . . .; x!
k
and X0 x!
01; x
!02; . . .; x
!0k
n o;
where k is the cardinality of the bags, we compute their
similarity as follows:
dm X; X0 minsi; siXmi1
x!si x!0
si
2
;
where m B k is a tuning factor and x!si x!0
si0
2
x!si x!0
si0
2
is the squared distance between the two
vectors. This distance function computes the smallest sum
of pairwise distance between the two bags. Although we
can set m = k, a smaller k value is more suitable for real-
time computation of the distance function. For instance, ifm = 1, two objects are considered the same if their
appearances look similar according to some single obser-
vation. We set m = 5 in our study. Traditionally, each
object is represented as a feature vector, i.e., a single point,
instead of a point set, in the multidimensional feature
space. This representation is less effective for object rec-
ognition. For claritys sake, let us consider a simple case in
which two different persons currently appear in the sur-
veillance system. One person wears a 2-color t-shirt with
white in the front and blue in the back. Another person
wears a totally white t-shirt. If the feature vectors extracted
from these two persons are based on their front view, the
two would be incorrectly recognized as the same object. In
contrast, the proposed multifaceted model also takes into
account the back of the t-shirt and will be able to tell them
apart. The bag model is more indicative of the objects.
The FIFO database is implemented as a series of queues
and hash tables residing in spatial processing layer hosts.
Each host maintains indices (hash tables) of the bags of
objects observed in video streams from corresponding
camera adapters. Indices associate objects with specific
video frames in which they were observed, video frames
with video streams, objects to bags, objects with queries
over the objects corresponding video streams, etc. Objects
appearing in two separate video streams will have two
separate bags in the index and two separate identifiers
(camera adapter identifier, local object tracking number,
and host identifiers concatenated into a string). If the two
objects are determined to be the same object, their bags are
merged and the index updated such that both object iden-
tifiers point to the same (merged) bag of observations.
Example 5 Cross-camera tracking allows queries to be
issued that consider multiple cameras to determine if an
event has occurred. For example, consider employees who
work in a building that does not permit smoking inside, but
has a back door and a bench next to the door for smokers to
sit. When someone comes out of the building and sits at the
bench, we assume it is an employee. When someone comes
down a nearby street and waits by the door, their motives
are unknown to us and a security guard should be notified
to observe the situation. In the LVDBMS, this can beaccomplished by creating a query over both cameras with a
cross-camera dynamic operand, to detect when someone
appears in the street camera (which does not observe the
bench or door) and then appears in a camera observing the
back door and bench. Thus, query targets are objects
appearing in first the one camera and then the second. An
associated privacy filter would obscure non-query targets
(object appearing in only one camera, or appearing in the
back door camera and later in the street view camera). The
privacy filter would be {NonQueryTargets, None, None}
and the query is Before(Appear(V1.#), Appear(V2.#))
where V1 shows the street and V2 the back door.
4 Evaluation
This section describes the experimental conditions in
which the LVDBMS software was evaluated and the
experimental results. The focus of the LVDBMS is in real-
time environment, as opposed to a system that operates
only with pre-recorded video. Thus, periodic activities such
as the amount of time required to evaluate a query must be
less than the frequency in which queries are evaluated.
Otherwise query evaluation queue will grow unboundedly
and query results will not be returned in a timely fashion.
In addition, because video streams entering the system
from cameras are unbounded (the camera can always be
turned on and transmitting video), only a small amount of
data can be retained within a sliding window before it must
be discarded to provide room for new data. Thus, imple-
menting privacy protection in real time is different and
challenging from doing so off-line because (1) the time
required to carry out data processing operations is bounded.
And (2) due to storage limitations, only a small portion of
the video data may be retained at any particular time in its
raw (unsummarized) format and must be processed in one
pass through reading through the data. In off-line pro-
cessing, the video data is stored and can be processed with
multiple passes over the data, for example, to create an
index structure to be used in a later processing stage.
4.1 Experimental setup
To test the effectiveness of the LVDBMS, we utilize three
sets of videos, where each video set satisfies a different
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
13/21
objective. We number the data sets from 1 to 3 as follows:
(1) We created a series of reference videos by placing
cameras in three locations in a campus building, inside two
laboratories (rooms) and in a hallway. Each laboratory has
slightly different lighting with no external windows, and
the hallway has exterior windows along one wall. This
provides reference videos with changing lighting condi-
tions, and the subjects at times are obscured by desks,chairs, and tables. This creates a challenging environment
to track objects from one camera to another. This series of
videos involved 5 people, with on average 2 or 3 people
appearing in the field of view at any particular time. (2)
Videos from the CAVIAR project (http://homepages.inf.ed.
ac.uk/rbf/CAVIAR/) are utilized. These are low-resolution
videos that provide video coverage of the same scene from
two different views, font and side. This is challenging
because the video resolution is small by todays standards,
and the objects appearing in the videos have relatively few
pixels to contribute toward building the appearance model
(bag of feature vectors). (3) We created a series of videosrecording traffic on roads (cars, trucks, and a few pedes-
trians were observed). Automobiles are rigid objects that
do not change shape while we observed them driving and
move in patterns (constrained by the road). This scenario
provides an excellent testbed to test spatial operators, such
as Appears, North, West, etc. with approximately perfect
tracking accuracy within a video stream.
We evaluated the LVDBMS with pre-recorded videos so
that the same conditions could be simulated with different
configuration parameters. In our LVDBMS implementa-
tion, image frames are presented to the camera adapter in
one of two ways: an initial processing thread either reads
the image data from a memory buffer that is written to by a
device driver for the camera hardware, or the image data
are extracted from a video file by a video codec. Once the
frame is extracted, it is enqueued in a new frame queue. A
second processing thread retrieves frames from the new
frame queue and proceeds to identify foreground pixels
from background pixels, etc. Once a frame of image data
has been enqueued on the new frame buffer, throughout its
lifetime it is indistinguishable whether or not the frame was
extracted from a live camera or from a pre-recorded video
file. Therefore, once the frame has been received from its
source, the frames original video source is indistinguish-
able to the rest of the system processing pipeline and has no
effect on query processing or other system behavior.
For these experiments, the frame-to-frame tracker is
configured to ignore detected objects less than 200 pixels in
area, which we will consider noise (but this parameter is
configurable at the camera adapter level). Software in all
tiers takes configuration settings from XML files, and to
facilitate scripting, also accepts command line arguments.
LVDBMS core components are implemented in C# and
utilize Language Integrated Queries (LINQ) to maintain
some internal queues and hash indexes. For the experi-
mental results presented in this article, the LVDBMS ser-
ver layers ran on a Windows 7 computer with 3 GHz
Pentium IV with Hyper threading CPU and 3GB RAM
(Dell Precision 370); and the camera adapters on a
2.54 GHz Core 2 Duo Latitude E6500 with 4GB RAM. For
the eight camera experiment discussed in Sect. 4.4.1, thecamera adapters were hosted on a Windows 7 HP Pavilion
laptop with a 2.3 GHz Quad Core CPU and 4GB RAM. We
use Emgu CV, a .NET wrapper for the Intel OpenCV
library, which is utilized for low-level visual operators in
conjunction with the Intel Integrated Performance Primi-
tives (IPP) library.
4.2 Effectiveness of privacy filters
The purpose of privacy filters are to obscure qualified
objects in videos from being identified. In this section, we
provide an example of how an object that is obscured bya privacy filter is displayed to users via the LVDBMS
client. Figure 11 illustrates two separate situations where
privacy filters obscure the identification of detected
objects. On the left-hand side, a person is walking toward
a door, and on the right-hand side a vehicle is traveling
down a street in a traffic-counting application. In these
examples, objects are obscured by blurring pixels con-
tained in the bounding boxes. Applying a blur maintains a
visually appealing image in which obscured objects do
not significantly stand out, but other options, such as an
adaptive blur based upon the side of the bounding box, or
simply setting the entire rectangle to a solid color such as
black, are other options depending upon how much the
appearance of the image should be obscured from the
video stream (and is not a focus of this work). Additional
privacy-preserving measures such as increasing the size of
the box to mask the size and shape of the object being
protected is another option with the tradeoff of decreasing
the utility of the video (as more of the video is obscured
to the viewer).
Privacy filter effectiveness is a function of the effec-
tiveness of the object detection logic, and depending upon
the query, the tracking logic. In the left image in Fig. 11,
the person walking satisfies query condition Appears(), as
well as false positives (FPs) identified by the background
segmentation algorithm due to a person walking through a
door and the closing. In this case, the relevant objects
identity is obscured, as well as four FP areas (which can be
reduced by adjust camera parameters), and the privacy
condition is satisfied. In this section, we do not provide a
separate table of privacy evaluation results, because had
the queries presented in Table 4 had privacy filters enabled,
the privacy filter effectiveness would have exactly the same
A general framework for managing and processing live video data
123
http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/ -
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
14/21
result presented in the third column of Table 4, tracking
accuracy.
The accuracy of privacy filters correctly obscuring the
appearance of an object from a video stream is related
directly to the specification of the privacy filter, and if it is
associated with a query, the query. For example, a privacy
filter that obscures all objects in a video stream is depen-
dent upon the object segmentation algorithm to correctly
identify objects from the video background. Todays
background segmentation algorithms are very accurate, and
in the cases where they error (such as complex moving
backgrounds), the errors would be incorrectly consideredobjects and have their appearances obscured. Likewise,
tracking algorithms which track objects viewable in con-
secutive frames of video in a single camera are also
accurate, but less accurate than the simple foreground/
background extraction. Similarly, tracking objects from
one camera to another is a more difficult problem, and as
indicated in Table 4, yet less accurate. If two objects
visually appear the same in two cameras, it is difficult
problem to determine if they are in fact the same object,
without an additional aid such as a security ID card with an
RFID. In order to maximize the privacy in these later sit-
uations, one could construct a privacy filter that obscures
all visible objects (rather than filtering based upon query
target or non-query target), thus minimizing the effect of
query accuracy. If an object has an appearance that is
sufficiently similar to the background, then it would not be
detected by the background segmentation algorithm, andwould not have a privacy filter applied to it. As soon as it
were to move or its appearance change such that it looked
sufficiently distinct from the background around it, it
would be recognized as a salient object by the LVDBMS
and any applicable privacy filters would apply to the
object.
4.3 Privacy filter demonstration
This section provides several scenarios showing the
application of privacy filters. The first demonstrates a
transportation application in which a Transportation Man-agement Center (TMC) operators terminal and a live news
feed originate from the same traffic camera. The query,
Appears(v1.*, 250), which monitors for objects in the video
stream sized 250 pixels or greater, is active on the camera,
v1. The live news feed is served through a view which has
associated with it a privacy filter that specifies that query
targets should be obscured, illustrated in Fig. 12.
Figure 13 shows a screenshot of the traffic camera video
feed as the TMC operator would observe it. Through the
LVDBMS they are viewing images from the camera and
are not associated with any privacy filters. The live video
provided for television, however, has access only to obtainvideo from the view, view1. This view has a privacy filter
associated with it, which applies to all objects that are
query targets, that is, which might contribute to a query
evaluating to true. This privacy filter has an effect only
when a query is active (in this example the query monitors
for objects appearing in the video stream which are larger
than a specified area in pixels). Figure 14 provides an
example of video observed through view1 with the query
active.
Fig. 11 Examples of objects
identities obscured by privacy
filters. The left image is from
data set (1) and right (3)
Table 4 Continuous query evaluation results
Query name Description Accuracy
Appear True if objects with area greater
than 100 pixels appears in the
frame, else false
100%
North before
south
True if there exists an object is
moving with downward motion.
Before operator has a windowsize of 20 frames; if the object
stops or changes direction for less
than 20 frames it is still
considered true
100%
Appear across
cameras
A person appears in camera 1 and
then is recognized when they
appear in a second camera
83%
(TP = 20,
FN = 4)
Appear, then
cover across
cameras
An object appears in camera 1 then
goes through a door (outlined by
a static object) in the second
camera
91%
(TP = 22,
FN = 2)
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
15/21
The next example (Fig. 15) demonstrates privacy filters
at multiple levels of the LVDBM hierarchy. A name plaque
is mounted in a corridor and must be obscured in video
streams sent to all consumers of this video source. To
accomplish this, a static object is defined over the plaque and
a privacy filter is associated with the camera (Fig. 16). This
camera-level privacy filter will be propagated to all con-
sumers of this video stream and factor in the effective pri-vacy filter computation for each consumers video stream.
The camera is accessed by two users, User 1 and User 2.
User 1 is not explicitly assigned a privacy filter, and User 2
has been assigned a privacy filter that is applicable to all
objects of type dynamic. (For example, User 2 might only
need to recognize general activity, and notify a supervisor,
User 1, when a closer review is required.) When the video
is viewed from User 1s terminal, the camera-level privacy
filter is propagated and applied to the static object drawn
around the plaque. The result is the plaque is obscured in
the video output on User 1s terminal, Fig. 17. User 2 is
explicitly assigned a privacy filter that applies to alldynamic objects in the video stream. The privacy filter on
dynamic objects is propagated to User 2 by the LVDBMS
automatically. When the frames are rendered into the video
stream for User 2, the privacy filters are combined (per the
discussion Sect. 3.10) and the effective privacy filter
applies to all objects identified in the video stream, as
illustrated in Fig. 18. Both terminal images in Fig. 18 are
from the same video stream, but illustrate two different
system configuration settings, depending upon how much
detail might want to be revealed about obscured objects.
The upper image has a blur operator applied, which would
blur identifying features while providing the operatorsubstantial visual information to observe behaviors. The
lower terminal image simply applies a bounding box that is
the average pixel color for the region to obscure.
4.4 Query evaluation accuracy for event monitoring
tasks
Two important aims of the LVDBMS are overall usability
and the ability to specify privacy policies in terms of
Relevant Object Privacy Filter
Camera None
User 1: TMC Operator None
User 2: News station
live video feed
None
View1 Target=QueryTargets
Fig. 12 Transportation
example showing a video
source, v1, providing live video
which is consumed by a TMC
operator and live television. The
table (right) shows privacy
filters associated with various
objects in this scenario
Fig. 13 Unmodified video originating from camera, v1, and as
viewed by the TMC operator who does not have a privacy filter
Fig. 14 Live video as viewed through the view view1 with privacy
filter and active query
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
16/21
privacy filters. To be relevant to surveillance applications,
the ability to define queries which detect noteworthy events
is important. In addition, privacy, and the ability to main-
tain some level of privacy for objects (i.e., people, identi-
fiable automobiles, etc.) as surveillance systems become
more automated and pervasive, is also important. Thus, in
this study, we redesigned the LVSQL query language to be
more concise and easier to use. Query accuracy is the
accuracy of detecting user-posed events of interest by
the LVDBMS. In the experiments in this section, we test the
LVDBMS to correctly interpret and evaluate four continu-
ous queries their results, i.e., if the conditions in the video are
true when the query indicates a true condition. Query results
are tabulated by manually monitoring the videos taken from
dataset (1) and query result in the LVDBMS GUI and
indicating the result every 5 s (by incrementing TP, TN, FP,
FN). Each query is evaluated over a 2-min period.
Relevant Object Privacy Filter
Camera ObjectScope=Static
User 1 None
User 2 ObjectScope=Dynamic
Fig. 15 Surveillance scenario
illustrating the application of
multiple layers of privacy filters
to different types of objects
Fig. 16 Unobscured view from camera indicating mounted name
plaque
Fig. 17 Video stream, as observed by User 1, with static object
obscured with a solid pattern. The manor in which an object is
removed from the video stream (solid box or blur) is configurable
Fig. 18 Video stream, as observed by User 2, with blur (upper) and
solid (below) patterns obstructing objects from the video stream
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
17/21
As expected, the accuracy for queries involving a single
video stream is extremely high. The accuracy of queries
that correlate objects across multiple camera views is
related to the accuracy of the underlying cross-camera
tracking infrastructure, as reflected in the two cross-cameraquery experiments. The short (2 min) experiments allow
only a few instances of each object to be observed and
reflected in the index, however, with only a few bags
representing objects in the index, there were no FPs or false
negatives (FNs) that could be attributed to mis-associating
an object in one video with the wrong object in the other
video. To determine query evaluation performance, we
constructed four queries, two single-camera queries, and
two multi-camera queries, and present the results in
Table 4.
4.4.1 Query processing performance
The resolution of a continuous query is the frequency in
which it is evaluated. In order to be usable in a real-time
system, query processing must be completed within a
bounded amount of time for query evaluation to not
become backlogged (and thus out of sync with the video
image a user might observe) with respect to frames from
streaming video, index updates, etc. We evaluate the per-
formance of the query processing engine by simultaneously
evaluating five queries for a period of 120 s over a random
selection of ten videos. Figure 19 provides the results of
evaluating the five continuous queries over each video withresults combined into a single plot. Table 5 presents
summary metrics for the data plotted in Fig. 19 normalized
for each query (divided by five). The dataset the video
came from is indicated after the videos name in the table.
Note that the cost to evaluate a query is a function of the
input to its respective operators; some operators, such as
AND and OR implement short-circuit evaluation and only
evaluate the second argument if the value of the first is
insufficient to determine the operator result. The data
reported in Table 5 is based upon five queries and one
video. We repeated this experiment with eight simulta-
neous videos, and the performance results were relatively
unchanged from those in Table 5. For the resolution of
video utilized for this experiment, 8 is the maximum
number of camera adapter instances that could be ran on a
4-core host without the frame processing rate dropping to
an unacceptable level (we consider approximately five
frames per second manageable, but lower processing frame
rates could lead to object segmentation and tracking errors,
for example). Once the video stream has been processed by
the camera adapter, the corresponding spatial processinglayer host receives a stream of object size and position
updates, and video frames. The frame-to-frame tracking
and background segmentation processes which occur in the
camera adapter processing pipeline is the most CPU
intensive stage in the data flow in the LVDBMS system.
Compared to the video data received by the camera
adapters, the quantity of data that flows to the spatial
processing layer, and then to the stream processing layer, is
substantially less at each phase. The spatial processing
0
20
40
60
80
100
120
140
160
180
1 815
22
29
36
43
50
57
64
71
78
85
92
99
106
11
3
CPUTim
einMilliseconds
Elapsed Time in Seconds
Query Evaluation Cost OneShopOneWait1front
ShopAssistant2cor
SR436_M2U00040
TwoEnterShop1cor
TwoEnterShop1front
TwoEnterShop3cor
TwoLeaveShop1cor
TwoLeaveShop2cor
Walk2
WalkByShop1cor
Fig. 19 Cost to evaluate five
simultaneous queries in terms of
CPU time
Table 5 Average query evaluation cost in terms of CPU, per query in
milliseconds
Movie Performance
Min Max SD Avg
SR436_M2U00040 (3) 0.40 5.60 0.73 0.78
OneShopOneWait1front (2) 0.40 30.81 6.26 4.58
ShopAssistant2cor (2) 0.40 21.24 2.72 2.49
TwoEnterShop1cor (2) 1.60 12.00 1.68 2.42
TwoEnterShop1front (2) 3.40 26.60 3.05 5.97
TwoEnterShop3cor (2) 0.40 7.00 0.90 0.87
TwoLeaveShop1cor (2) 1.40 15.60 2.75 3.47
TwoLeaveShop2cor (2) 0.40 14.00 1.86 1.38
Walk2 (2) 0.40 1.80 0.40 0.72
WalkByShop1cor (2) 0.40 3.00 0.49 0.73
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
18/21
layer performs index updates and query evaluations, which
is not CPU intensive, and in turn sends sub-query evalua-
tions to the stream processing layer at the resolution of
each query (e.g., once each second).
Data presented in Fig. 19 and Table 5 show results from
a mixture of queries which evaluated with within a period
of time well below the query resolution of 1 s, for the
evaluation period. Query evaluation entails computingoperator values which require operand lookups within
index structures and finally updating metadata for objects
to indicate query targets. What we want to emphasize with
these results are that on average, over a wide variety of
input videos, query execution on average is well below the
1 s query resolution. Had query execution exceeded 1 s,
query results would be out of sync with video frames
presented to the user via the client.
4.5 Multi-camera object tracking for privacy filter
correctness
This section provides camera-to-camera tracking results
from our tracking technique based upon a multifaceted
object model based upon an objects appearances in video
streams. An essential feature of the privacy framework is
the ability to construct a query and use it to either include
or exclude dynamic objects from a privacy filter. Thus,
object tracking and cross-camera object tracking (when a
privacy filter or corresponding query is formulated to make
use of such functionality) correlate positively with privacy
filter accuracy.
Figures 20 and 21 present accuracy results from two
sequences of videos, from dataset (1), that involve
tracking people across a three cameras setup in a labo-
ratory environment as described in Sect. 4.1. In order to
maximize the number of results to present, for this section
we query the index for each observation of each object in
each frame of video. That is, for each frame, we query the
index for the 1st nearest neighbor from the query point
(i.e., the objects feature vector) and return the result if it
is sufficiently close else nothing is returned. If a result is
returned, if the result is the correct object, true positive
(TP) is incremented, else false positive (FP) is incre-
mented. Likewise if no result is returned, true negative
(TN) is incremented if the object is not in the index else
we increment FN. Next, the bag corresponding to the
object is updated to include the currently queried instance
(based upon the cluster identifier assigned by the frame-
to-frame tracker). This process is repeated for each frame
in the video. We present accuracy, measured in terms of
precision and recall:
Accuracy TP
FP FN TP:
(The accuracy equation does not consider TN because if an
event does not occur it will not be detected. Furthermore, if
an event does not occur and we claim that it did occur, that
is considered a FN which is a factor in the equation.) As we
see from the accuracy indicated in Figs. 20 and 21, initially
the feature space is sparse and the bag representations
contain few points (and thus small corresponding standard
deviations along the various dimensions). As more obser-vations are added to the bags in the index, the bag repre-
sentations become more indicative of what we are likely to
observe of a particular object in the future, and the accu-
racy stabilizes. The object-tracking technique we present is
based upon the visual appearance of an object, and when
more objects. Even though a FIFO queue is utilized to limit
the duration of time an object taken into consideration, for
tracking purposes, when many objects appear in video
streams simultaneously, the likelihood increases that some
of the objects will look sufficiently similar that they may be
mistaken for one another, resulting in decreased accuracy.
5 Related work
An LVDBMS encapsulates work from a multitude of
domains including continuous query languages development,
Fig. 20 Cross-camera tracking accuracy for sequence 44
Fig. 21 Cross-camera tracking accuracy for sequence 46
A. J. Aved, K. A. Hua
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
19/21
computer vision techniques such as object detection and
tracking. For completeness, we include a review of recent
video surveillance related topics.
5.1 Privacy considerations
As cameras become pervasive, improved video surveil-
lance systems will be required to overcome the limitationsimposed by direct and continuous human monitoring. This
will result in increasing volumes of video that is processed,
published, monitored, and stored. References [5, 10] sug-
gest that privacy is a function of what is deemed culturally
and socially acceptable by society. Several privacy-aware
systems have been developed which can detect movement
and mask it for privacy considerations. For example, in
[27] pedestrians are obscured with multicolored blobs,
where color specifies a status, such as crossed a virtual
trip wire. Reference [15] develops an MPEG-4 transcoder
and decoder to mask objects in a video stream based upon
movement. While these systems increase privacy bymasking the objects identity, they are not helpful in fight-
ing crimes because the obscurity is irreversible. Further-
more, they do not provide functionality to determine if an
object should indeed be masked in the output video stream.
Large collections of data provide data mining opportu-
nities for discovering global trends, decision making,
capacity planning, building machine learning classifiers,
etc. Data in its original form, such as hospital patient
demographic data, contains information that violates indi-
vidual privacy. Privacy-preserving data publishing (PPDP)
proposes algorithms to make data available for mining
global trends while preserving individual privacy. These
techniques range from monitoring the individual queries
issued, to perturbing the data in various ways. For example,
an attacker might try to identify a patients record in a
public data set.
The majority of research on privacy control methods
focuses on statistical databases containing tabular data.
Security control methods generally entail query restric-
tions, data perturbation, and output perturbation [2]. Query
restrictions entail monitoring queries, e.g., the number of
queries submitted by a particular user, the amount of
overlapping data that is queried by user, etc. Data pertur-
bation entails modifying data values stored in the database,
such as replacing the age values of people with the average
age by zip code. Output perturbation involves injecting
error into the query result. Thus, there is a tradeoff between
accuracy and confidentiality, inducing higher error results
in a lower likelihood of identifying particular data values
but results in more skewed aggregated results.
When privacy filters are applied to video streams, the
effect is a type of data perturbation. In an ideal scenario,
the modified streams should not reveal anything about the
individuals appearing therein [5]. However, [8] has shown
that an absolute guarantee of privacy is unachievable in the
presence of external auxiliary information. Some recent
works, such as [26], investigate identity leakage through
implicit inference channels, such as time of day combined
with camera location. For example, if a camera shows an
office door and one observes a blurred figure entering at 8
a.m. and leaving at 12 p.m. one can assume the obscuredperson and the person assigned to that office are the same.
Thwarting this type of attack on privacy is beyond the
scope of the method we propose in this study. Our primary
aim is to make objects appearing in a video stream indis-
tinguishable from one another in accordance with the
current privacy specification. We note, however, that with
our framework, identifiers such as office door numbers and
placards can be defined as static objects and an appropriate
privacy specification can be defined to redact them from
the output video stream.
In this study, we present a flexible privacy framework
which has the goal of protecting individual privacy whileproviding data streams that can be queried for events as
accurately as possible. Thus, we choose to perturb the
output data in some ways (i.e., obscure objects with
bounding boxes of varying degrees of tightness) but not
others (such as skewing the video in the time domain,
adding ghost objects to hide when real ones appear,
etc.).
5.2 Object detection and tracking
There are many existing video surveillance systems over
networked cameras, e.g. [3, 12]. In particular, object rec-
ognition and tracking is a core component of these systems,
forming a basis for high-level analytic functions for scene
understanding and event detection. Since cameras have a
limited resolution and field of view, multiple cameras may
be required to provide coverage over the area of interest.
Typically, the fields of view of adjacent cameras may not
overlap due to economics, the environment, or computation
constraints. These practical factors place a great challenge
on tracking objects moving across multiple cameras.
Existing multi-camera tracking environments [14, 18,
19, 21, 22, 28, 30, 32] require various types of calibrations
and/or information on the spatial relationships between the
various cameras, to be configured into the system as known
parameters. They assume overlapping fields of view of the
cameras, or non-random movement patterns. In the latter
scenario, when an object moves from the field of view of a
camera into that of the next camera, this object can be
recognized in the second camera by taking into consider-
ation the speed and trajectory of the object when it exits the
field of view of the first camera [21, 22]. This strategy is
only applicable to non-random movement patterns such as
A general framework for managing and processing live video data
123
-
8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection
20/21
objects constrained by roads, walls, etc., and cannot be
used for general-purpose a