A General Network for Managing and Processing Live Video Data With Privacy Protection

8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

1/21

R E G U L A R P A P E R

A general framework for managing and processing live video datawith privacy protection

Alexander J. Aved Kien A. Hua

Springer-Verlag 2011

Abstract Though a large body of existing work on video

surveillance focuses on image and video processing tech-niques, few address the usability of such systems, and in

particular privacy issues. This study fuses concepts from

stream processing and content-based image retrieval to

construct a privacy-preserving framework for rapid devel-

opment and deployment of video surveillance applications.

Privacy policies, instantiated to as privacy filters, may be

applied both granularly and hierarchically. Privacy filters

are granular as they are applicable to specific objects

appearing in the video streams. They are hierarchal because

they can be specified at specific objects in the framework

(e.g., users, cameras) and are combined such that the dis-

seminated video stream adheres to the most stringent aspect

specified in the cascade of all privacy filters relevant to a

video stream or query. To support this privacy framework,

we extend our Live Video Database Model with an infor-

matics-based approach to object recognition and tracking

and add an intrinsic privacy model that provides a level of

privacy protection not previously available for real-time

streaming video data. The proposed framework also pro-

vides a formal approach to implement and enforce privacy

policies that are verifiable, an important step towards pri-

vacy certification of video surveillance systems through a

standardized privacy specification language.

Keywords Query language Privacy framework Video database system Real-time Object recognition Object tracking

1 Introduction

Camera networks have been the subjects of intensive

research in recent years, and can range from a single

camera to a network of tens of thousands of cameras.

Usability is an important contributing factor to the effec-

tiveness of such networks. As an example, a camera net-

work in London costs 200 million over a 10-year period.

However, police are no more likely to catch offenders in

areas with hundreds of cameras than in those with hardly

any [29]. This phenomenon is typical for large-scale

applications and can be attributed to the fact that the

multitude of videos is generally not interesting and con-

stant manual monitoring of these cameras for occasional

critical events can become fatiguing. Due to this limitation

in real-time surveillance capability, cities mainly use video

networks to archive data for post-crime investigation.

To address operator fatigue and increase the effective-

ness of the camera network, automatic video processing

techniques have been developed for real-time event

detection under various specific scenarios. These systems,

such as the live video database discussed in this study, can

monitor live video streams in real time for events of

interest and can alert human operators upon their detection.

However, pervasive monitoring by corporate and govern-

mental entities can lead to privacy concerns. For example,

archived video from a police camera network could later be

used for purposes other than what for which it was origi-

nally collected. Public information laws could make video

footage collected for legitimate purposes available to

anyone who requests it. In a corporate setting, cameras

deployed to record customer behavior might capture

employees after their work shift has ended.

Deploying a sizable camera network entails a significant

monetary investment. For aesthetic reasons it is also

A. J. Aved (&) K. A. HuaUniversity of Central Florida, Orlando, FL, USA

e-mail: [email protected]

K. A. Hua

e-mail: [email protected]

123

Multimedia Systems

DOI 10.1007/s00530-011-0245-x


2/21

desirable to minimize the number of cameras deployed

(e.g., in a historic district of a town). For reasons such as

these, it is would be beneficial if multiple entities could

share access to the cameras. Police could monitor for

suspicious activity and crime scene evidence, utility com-

panies could gauge outages after inclement weather, and

sanitation departments could assess the productivity of new

employee vehicle operators, to name just a few possiblecollaborators. Possible benefits entail shared deployment

costs, and the possibility of providing service to stake-

holders who otherwise could not justify the expense of a

single-purpose camera network deployment on their own.

However, shared usage and monitoring by indeterminate

or changing interests could lead to significant privacy

concerns. Thus, to address the usability and privacy con-

cerns of a general-purpose camera network, three factors

are important:

1. The software system must support ad hoc monitoring

tasks. An event of interest in one domain such astransportation monitoring is generally different from

another domain such as crime prevention. Events of

interest can also vary significantly between individual

users.

2. It is desirable to provide the capability to enable rapid

development of customized applications for domain-

specific users, such as counting cars along a section of

highway, or informing an employee when a nearby

conference room becomes available early.

3. People are concerned about privacy and there are

increasing objections to pervasive monitoring. For

applications that run atop a camera network, there is aneed for policies which specify the level of privacy

they adhere to, and mechanisms which implement said

policies.

To achieve the first two factors, we have designed and

implemented a general-purposed Live Video Database

Management System (LVDBMS) [25] as a platform to

facilitate live video computing. It allows automatic moni-

toring and management of a large number of live cameras.

These cameras in the network are treated as a special class of

storage with the live video streams viewed as database

content. The user is able to specify an ad hoc video moni-

toring task by formulating a query to describe a spatiotem-

poral event. When the event occurs and is detected by a

monitoring query, an action associated with the query is

executed. This general-purposed LVDBMS also enables

rapid development of live video applications much like

database applications are developed atop standard database

management systems today. Another work that allows the

user to specify semantically high-level composite events for

video surveillance is presented in [31]. However, this tech-

nique requires the user to formulate queries procedurally

using low-level operators. In contrast, our query language is

declarative. The user only defines the event and the system

automatically generates the corresponding query processing

procedure.

To address the third factor mentioned previously, we

present in this article a privacy framework for the

LVDBMS. This framework implements a privacy speci-

fication language (PSL) that permits privacy policies to bespecified, and enforced by removing identifiable infor-

mation pertaining to objects from the video streams as

they are made available externally by the system.

Example consumers of streaming video could be a file for

storage, an operators video terminal, or a live video feed

captured for use in a traffic report on a television news

broadcast. This facility allows the user to specify various

privacy views atop the raw video stream to remove the

objects from the output stream, which has the aim of

protecting individual privacy while retaining general

trends evidenced in the video stream such that further

scene analysis is meaningful. Here objects refer to aperson, animal, or vehicle that is not part of the video

background. These objects are characterized using a

multifaceted object model. As part of the new extension

to our prototype, we also introduce in this article an

informatics-based approach to cross-camera tracking

technique. This scheme permits queries to be defined that

span multiple video streams.

The remainder of this article is organized as follows. In

Sect. 2, we give an overview of our LVDBMS to make the

article self-contained. The proposed privacy framework is

introduced in Sect. 3, with experimental results in Sect. 4.

Related work is discussed in Sect. 5. Finally, we conclude

this article in Sect. 6.

2 LVDBMS environment

In this section, we briefly introduce LVDBMS, a distrib-

uted video stream processing Live Video Database (LVD)

environment, and refer the reader to [25] for further details.

Large networks of cameras provide a proliferation of

multimedia information, and there is a profound need to

manage and organize the volumes of video data into pro-

portions relevant for human consumption. In an environ-

ment with numerous video cameras, the goal is to provide

human operators with a facility to specify relevant scenes

of interest, minimizing exposure to uninteresting and

irrelevant scenes. However, today there is a technological

gap between what state of the art software technologies can

provide in terms of identifying rich content, and consumer

expectations. The LVDBMS allows users to mine numer-

ous video streams for events of interest and perform actions

when the specified scenarios are encountered.

A. J. Aved, K. A. Hua

123


3/21

2.1 LVDBMS architecture

The LVDBMS is a distributed video database management

system. Operators interact with the highest layer, and view

video streams and query results through computer terminals

(Fig. 1). Figure 2 illustrates the fourlogicaltiers implemented

in the LVDBMS architecture which communicates through

web services interfaces. Multiple cameras in the camera layer

may be associated with a single host in the spatial processing

layer. Spatial processing layer hosts perform real-time motion

imagery processing techniques for abstract object recognition

and partial query evaluation (dependent upon available data at

the host). Intermediate query results are streamed to a host in

the stream processing layer which periodically computes the

final query result which is made available to clients in the

client layer. We discuss these softwarelayers in more detail in

the following sections.

2.1.1 Camera layer

The camera layer, Fig. 3 left, consists of physical devices

that capture images. Each camera is paired with a camera

adapter which runs on a host computer. We do not assume

that the cameras have built-in analytical capability other

than capturing images and making them available to a

corresponding camera adapter.

The camera adapter allows any type of camera to be used

with the LVDBMS using relevant drivers. When processing

a scene, the camera adapter first performs scene analysis on

the raw image data to determine background pixels and

foreground objects. The segmented objects then flow into

the frame-to-frame tracking module, which tracks objects

within consecutive frames in a cameras field of view and

assigns them an object number (unique within the camera

adapter). For each image a bag of feature vectors is calcu-

lated (a bag is similar to a set but allows for duplication).

Once each frame from the camera is processed, it is

bundled as an image descriptor and sent to the spatial pro-

cessing layer for query evaluation. The image descriptor

may contain the actual image bitmap if specifically

requested by the spatial processing layer host, but otherwise

it contains only image meta data, such as the identifiers and

locations of objects identified within the frame, the corre-

sponding feature vectors, frame sequence number, etc.

2.1.2 Spatial processing layer

Spatial processing layer hosts evaluate spatial and temporal

operators over the streams of image descriptors and provide

Fig. 1 LVDBMS hardware

architecture

Fig. 2 The LVDBMS is logically grouped into three layers plus a

client layer

A general framework for managing and processing live video data

123


4/21

a result stream to the stream processing layer (Fig. 3 mid-

dle). A server hosting the spatial processing layer will ser-

vice many cameras, but a camera adapter will be associated

with only a single instance of the spatial processing layer.

Server replication at this layer allows the LVDBMS to scale

to an arbitrarily large number of video streams.

2.1.3 Stream processing layer

The stream processing layer (Fig. 3 right) accepts queries

submitted by clients and partial query evaluation results

from the spatial processing layer. It does not interact

directly with cameras or their adapters. We note that we

can have replication at the stream processing layer for and

fault-tolerance. Queries are decomposed into an algebraic

tree structure which is partitioned by the host and pushed

down to relevant servers in the spatial processing layer. As

sub queries are evaluated, results are streamed back to the

Stream Processing Layer where they are combined and the

final query result computed. Sub query results may arrive

out of order, get lost in the network, or a camera may

unexpectedly go offline and must be gracefully handled.

2.1.4 Client layer

Users connect to the LVDBMS and submit queries using a

graphical user interface, depicted in Fig. 4. The client

allows users to browse available video cameras, and define

and submit queries and review results.

2.2 LVDBMS data model

A query is a spatiotemporal specification of an event, posed

over video streams, and expressed in Live Video SQL

Fig. 3 Software layers of the LVDBMS and major components contained therein

Fig. 4 The LVDBMS client

allows users to browse cameras,

construct queries, and send

system commands


123


5/21

(LVSQL), a structured query language. A query defines

which streams will be accessed and what information will

be returned. In an LVD, information is contained in live

video streams which are inputted to the LVDBMS in real

time. The fundamental construct in an LVD is an object,

which is either indicated by the user (a static object) or

automatically detected (dynamic object).

We provide a brief description of the LVD data modeland refer the reader to [25] for additional details. A video

stream consists of temporally ordered frames where each

frame represents a snapshot of what was detected by an

image sensor at a particular time. An object, then, is some

real-world physical entity whose image was captured and is

represented in the frame. (For our definition of object, we

do not consider the background to be an object it is spe-

cifically indicated in that query.) There are two types of

operators in LVDs: spatial and temporal. Spatial operators

are formulated over objects that are visually captured in

video streams (i.e., overlaps, meets, disjoint, exists, etc.).

Temporal operators evaluate the temporal relationshipsbetween spatial events (i.e., before, during, etc.). When

constructing LVSQL statements, there are three types of

objects that may be referenced:

1. Static objects are indicated by the user by drawing an

outline on a frame captured from the video stream, at

query submission time.

2. Dynamic objects the objects that appear in a video

stream and are not part of the background. They may

be specified as an asterisk (*) in the query.

3. Cross-camera dynamic objects dynamic objects

detected in a camera and matched with an object thatwas (or is) viewed in another camera. In the query

language, these objects are denoted with a pound sign

(#), e.g., Before (Appear(V1.#), Appear(V2.#), 120),

which queries for an object that appears in stream V2

within 120 s of appearing in V1.

As events occur in real time, queries must be resolved in

real time as well. Queries that require temporarily storing

historical video data are always parameterized such that the

data that must be retained for query evaluation is always

contained to a temporal sliding window. The resolution of a

query refers to the frequency with which a query is evaluated.

For example, a query that is evaluated five times each second

is of finer resolution than a query evaluated each second.

2.3 LVDBMS query language

The LVSQL query language is used to pose continuous

queries over events that occur in live video streams in real

time. The essential form of an LVSQL query is as follows:

ACTION \Action[

ON EVENT \Event[

where\Event[ syntax is expanded upon in Fig. 5. In

LVSQL, spatial operators take objects as arguments, and

temporal operators take output of spatial operators (i.e., bit

streams) as arguments. Boolean logic combined with the

various temporal and spatial operators results in a very

expressive language capability.

All queries must involve a spatial operator; the simplest

query expressible could check for the existence of any

object appearing in the field of view of a particular camera.

This spatial query could then be enhanced with a temporal

component, for example, duration: trigger an alarm if an

object appears and persists for longer than 10 min; or two

spatial operators could be combined with a temporal

operator: alert if an object contacts a particular desk, then

walks through a door.

3 Privacy filter framework

In this section, we introduce our privacy framework and

explain how it is applied. The implementation details are

discussed in Sect. 4. While members of the public gener-

ally accept having their image recorded by cameras, it is a

violation of their trust to use their data for purposes people

may find intrusive, or to have their image used for reasons

contrary to the known usage of the cameras. Examples of

Fig. 5 LVSQL event

specification syntax


123


6/21

intrusive uses could be a security guard observing shoppers

for personal edification during a rest period, or corporate

mining of the behavior of individuals in a store so they may

have items marketed to them the next time they enter that

store, etc. As cameras become ubiquitous in public loca-

tions, camera networks become smarter and storage

capacity increases such that larger volumes of video data

may be retained for lengthier periods of time, it is anincreasing concern that video data collected for one pur-

pose may be used for other purposes. If an intrusive usage

was known to the target individual, they might have chosen

to not participate, by shopping in a competing mall. In this

article, we introduce privacy filters as the framework which

could anonymize the people who are observed by net-

worked video cameras in accordance with a privacy policy.

Depending upon the specifics of the privacy policies being

enforced, global trends in the videos could still be visible,

such as people going into and out of a room, while the

appearances of the individuals could be redacted, thus

minimizing the potential for misuse of the video andunintended consequences for the people being observed.

We endeavor to protect the identity of innocent indi-

viduals. However, some users need the option of investi-

gating and identifying individuals (if they have the

authority to do so). In this scenario, the LVDBMS is

designed to allow someone with the proper access to

investigate the identity of individuals via unperturbed

video streams, while applying privacy filters to protect

individual privacy by blocking video stream consumers

without sufficient access privileges. Thus, privacy

enforcement does not affect the intended utility or intelli-

gibility of the video. The challenge is to accomplish this in

real time without restricting who consumes the stream so

that actions triggered by events occurring in the stream are

timely and relevant.

3.1 Scope and assumptions

This section describes the objectives of the transformation

that privacy filters induce upon corresponding video

streams. It also defines the scope of implementation

assumptions inherent to our LVDBMS prototype.

3.1.1 Scope of privacy applied to video stream output

The proposed privacy framework is based on the concept

of privacy filters. A privacy filter implements a privacy

policy, which specifies under what circumstances the

appearance of objects appearing in video streams passing

through the filter may be observed or must be redacted

from the stream. The primary goal of privacy filters is to

obfuscate the appearance of qualifying objects such that

objects will become unidentifiable after passing through

the privacy filter. More precisely, a privacy filter defines a

set of criteria. This criteria is matched with objects that are

identified in video streams. This criteria can be specified to

be very precise (e.g., objects that satisfy a particular query

condition based upon temporal and spatial location), or

general (e.g., applied to all objects observed by a particular

camera). Thus, the scope of privacy filters are the salient

objects appearing in the video stream; not the environment,such as the scene background observed by a camera, or

other conditions that can be observed such as time of day,

conjectures based upon knowledge of the location of the

camera from which the video stream originates, etc. Fur-

thermore, privacy filters are applied to video streams as a

final step before the streamed data is externalized from the

system. In order to maximize query accuracy, queries and

internal indices for object tracking are based upon metrics

calculated from non-obfuscated data. This raw metadata is

never externalized by the LVDBMS or explicitly saved to

persistent storage.

3.1.2 Scope of system prototype implementation

The focus of this research is privacy policies, the realiza-

tion of those privacy policies as privacy filters, and the

corresponding transformations privacy filters have upon

corresponding video streams. We also do not consider as a

part of this work aspects of system security that must be

considered in an actual physical deployment to a public

area. For example, we do not consider the physical aspects

of the system, such as the physical security of servers

hosting LVDBMS software, cameras and the communica-

tions channels between cameras and LVDBMS hosts.

However, we note that such things can be accomplished by

other means such as purchasing fixtures to hold the cam-

eras and enabling encrypting communication tunnels via

the operating system or through virtual private networks,

etc. Furthermore, we do not attempt to detect and thwart

privacy attacks against the system, such as through a series

of specifically crafted queries issued by a user designed to

leak unintended information. Although we do implement

certain safeguards, such as providing a mechanism to

restrict which cameras and video streams a user can

observe, we assume a user is who she presents herself to

be, and not a malicious user masquerading as a legitimate

system user.

3.2 Framework overview

Privacy filters may be applied at different levels in the

LVDMBS system hierarchy, and video streams may be

affected by multiple privacy filters. This cascade of pri-

vacy filters is conceptually similar to how views may

restrict columns in a traditional relational database (Fig. 6


123


7/21

left). When a video stream passes through multiple pri-

vacy filters, the effect is that the most stringent privacy

level is applied (note that a privacy filter does not nec-

essarily apply to each object appearing in a particular

frame of video in a passing stream; Fig. 6 right). Simi-

larly, in a relational database a user may be allowed to

only access views, and relational views may be built upon

other views, which may themselves reference the physicaltables or yet other views.

In the LVDBMS hierarchy, privacy filters are associated

with cameras, queries, user groups, and view objects:

Camera: Any camera with the system can have a

privacy filter associated with it. Applied at this level,

the privacy filter has the broadest impact as it affects all

consumers of this camera.

Query: A privacy filter at this level has a moderate

impact, as it is associated with a specific query. It

affects only the consumers of the querys output.

User Group: A privacy filter at this level has thenarrowest impact. Only the users in this group are

affected.

View: A view, implementing a privacy filter, may be

defined over a stream or a previously-defined view.

Queries and users may access the underlying video

stream through the view with the constraint that the

privacy filter will be applied to the views output.

3.3 Filter output sanitation model

While the previous section provided a conceptual definition

of privacy filters, for clarity this section gives a more

precise treatment. Let Q be the set of active queries posed

over streams S in the LVDBMS. A stream S 2 S in the

stream processing layer is a first in first out (FIFO)

sequence of frames S ffi;fi1; . . .;fik1g where k is thelesser of the maximum number of frames required to

resolve any active query q 2 Q and a system definedmaximum. (Note that frame fi 2 S for any S represents themost recent image captured by a camera, after a negligible

processing and communication delay, and S is maintained

in real time as frames are received from spatial processinglayer hosts.)

Frames in S are retrieved via a frame access function:

AccS; k ! frame

which retrieves the frame and corresponding metadata in

the kth position in S. When a stream is externalized fromthe LVDBMS (such as for display on a users terminal,

saved in a file, etc.), it is passed through a sanitizer

function:

San S;f ! Acc S; 1 Zf

where Z returns a mask that indicates which regions of theframe to obscure in accordance with the privacy filter fand

obfuscates the image bitmap contained in the frame withthe mask, perturbing the output of San. When selecting

from a view with a privacy filter f0 San becomes:

San S;f ! Acc S; 1 Zf f0

where * combines the filters as described in the previous

section. In literature, some sanitizers chose to not answer

queries or add noise using a statistical distribution. How-

ever, Z is deterministic with its parameter f. Furthermore,

detecting an attack (that is, a determination of information

the system attempts to redact with the mask) is beyond the

scope of this study, and all queries are assumed to be

legitimate.

Fig. 6 Relational database

view (left) compared with

illustration of a cascade of

privacy filters (right)


123


8/21

3.4 PSL

The PSL allows a system administrator to implement pri-

vacy policies by constructing privacy filters, and to manage

system user access. Privacy filters can be associated withgroups, and thus with individual users through their user-

group membership association. When applied to a group, a

privacy filter affects all people in the group. Privacy filters

may also be associated with views, in which case they

apply to any accessors of the view. (Privacy filters may also

be associated with cameras, but this is specified in the

configuration file associated with the cameras adapter.) All

privacy filters are cumulative; the system does not provide

a way to reduce privacy by adding a new privacy filter.

When a user creates a query, their privacy filters (via

group memberships) are in turn associated with the query.

However, a system administrator to create canned que-ries which users may run unaffected by the privacy filters

associated with the executors account. For example, such

a query could save an unperturbed video stream to a per-

sistent storage location the user does not have access to, if

they feel a crime is occurring and observed in a camera that

they do not have unrestricted access to. The syntax for the

PSL is provided in Fig. 7.

3.5 Design and implementation of privacy filters

This section presents operational and implementation

details of privacy filters in the LVDBMS. When two ormore privacy filters apply to a stream, they are combined

into an effective privacy filter.

An active query is a currently running query in the

LVDBMS system. Each query has operators that specify

object(s) from cameras (i.e., spatial operators) as input. A

relevant object is an object that appears in a video stream

referenced in an active query, and can potentially con-

tribute to the query evaluating to true. If the query becomes

true, the contributing relevant object is called a target

object; otherwise, it is a non-target object. Consider the

query Contains (C1.S1,C1.*) illustrated in Fig. 8. This

query determines if there is any dynamic object detected

within the static object S1 represented by the dashed rect-

angle. Since the dynamic object D121 is contained within

S1, D121 is a target object. On the other hand, another

dynamic object D102 is not contained in S1, D102 is a non-

target object. A privacy filter can be specified so that only

target objects, non-target objects, or all relevant objects

should be protected (i.e., have their appearance obscured).If a protected object no longer satisfies a privacy filter

specification, this object obtains the status previously

masked. In this example, if the privacy filter is to blur all

the target objects, then the dynamic object D121 is a pre-

viously masked object after it leaves the boundary of S1.

We note that displaying a live video stream from a given

camera is a similar to a query which always evaluates to

true, and the output video stream is the same as the input

video stream.

{CREATE | UPDATE | DELETE} FILTERfilter_identifier

[TARGET = {QUERYTARGETS | NONQUERYTARGETS | PREVIOUSLYMASKED}]

[TEMPORALSCOPE = {QUERYNONACTIVE | QUERYACTIVE | PERMANENT}]

[OBJECTSCOPE = {STATIC | DYNAMIC | CROSSCAMERADYNAMIC}]

{CREATE | UPDATE | DELETE} VIEW view_identifierOVER stream_identifier

[WITHfilter_identifier]

{ASSOCIATE | DISASSOCIATE} GROUP group_identifierWITH

{FILTER | VIEW}filter_identifier

{CREATE | DELETE} USERGROUP group_identifier

{ASSOCIATE | DISASSOCIATE} USER user_identifierWITH group_identifier

Fig. 7 Privacy Specification

Language; uppercase represents

a keyword and italics a user-

supplied parameter

Fig. 8 QueryTarget versus NonQueryTargets: D121 satisfies the

query condition and is a target


123


9/21

3.6 Defining privacy filters

Privacy filters are created in the LVDBMS in one of two

ways: they are specified in configuration files at system

startup, or are created with the create filter PSL statement.

A privacy filter is specified by the attribute 3-tuple: {Tar-

get, TemporalScope, ObjectScope} where Target, Tempo-

ralScope, and ObjectScope are quantified in Tables 1, 2,and 3. Not all attributes are applicable to every use of

privacy filters, and may be set to None. The Target

parameter specifies if the filter applies to only target

objects, non-target objects, all relevant objects defined over

the camera (static objects defined by users and dynamic

objects detected in the video stream), or None indicating

that this attribute should not be considered when deter-

mining if a privacy filter applies to a particular object.

ObjectScope refers to the type (scope) of objects the filter

applies to; static or dynamic objects, cross-camera

dynamic, none or all objects appearing in the stream.

TemporalScope indicates when the filter will be applied;always, never (the filter is currently inactive), or when the

stream is or is not being accessed by a query.

3.7 Privacy filters applied to cameras

Logically, we may associate privacy filters with physical

cameras, but to make the LVDBMS software more flexible

with respect to the types cameras that can be used with the

system, privacy filters are actually evaluated by the camera

adapter.

When the camera adapter is initialized, it takes its initial

configuration from a configuration file. The initial state of

its privacy filter can be specified in this file and will persist

for the lifetime of the adapters state. (When the LVDBMS

is in operation, an operator may specify new default pri-vacy filter settings.) A cameras privacy filter is maintained

in the camera adapter.

Example 1 if a camera has a privacy filter with an

attribute set to None, then by itself, that filter will have no

effect. However, when combined with the privacy filter of

an active query, it can combine to elevate the privacy state.

For example, given a camera with a default privacy filter of

{All, None, None} will not result in any effective privacy

state when a query is evaluated with its images. However,

given a query with privacy filter {QueryTargets, Query-

Active, Dynamic} will evaluate to the effective privacy

state {All, QueryActive, Dynamic}. The difference is that

all dynamic objects will be obscured, instead of only ones

that are query targets.

The attribute values of a privacy filter apply at the

camera level as follows. The Target parameter specifies

whether it applies to objects that are the target of active

queries (QueryTargets), objects that are not targets of

active queries (NonQueryTargets), all objects defined over

the camera (static, dynamic and cross-camera dynamic)

(All), or no objects (None). PreviouslyMasked refers to

objects that previously qualified to be included in a Privacy

Filter. (i.e., they were a non-query target in the camera with

a NonQueryTargets attribute set, were a query target with

the QueryTargets attribute, etc.) We note that from the

perspective of a camera, an active query is a query that has

(1) been issued to the LVDBMS system; (2) is not expired;

(3) has not evaluated to a condition that executes an action

that causes the query to terminate; and (4) the query has an

operator that specifies as input object(s) from said camera.

A query target is an object that appears in the field of

view of a camera, and satisfies two conditions: (1) it is a

static object, dynamic object, or cross-camera dynamic

Table 1 Privacy filter values for parameter type Target

Attribute Description Priority

None No privacy 1

QueryTargets Targets of active queries are obscured 2

NonQueryTargets Objects that are not targets of active

queries are masked. An active query

may obscure their identity

2

PreviouslyMasked Specifies that objects that were

previously masked will continue to

be masked

2

All All object identities are masked,

regardless of query status

3

Table 2 Privacy filter values for parameter type TemporalScope


None No privacy 1

QueryNonActive Privacy settings apply only when a

query is not active

2

QueryActive Privacy settings apply only when a

privacy-enabled query is active (in the

case of privacy applied to a camera,

for example)

2

Permanent Privacy settings apply for the lifetime of

the object or camera or query

3

Table 3 Privacy filter values for parameter type ObjectScope


None No privacy (no relevant objects

qualify)

1

CrossCameraDynamic Objects that are first detected in

another camera

2

Dynamic Dynamic (automatically detected)objects 2

Static Static (user defined) objects

All All classes of objects qualify 3


123


10/21

object associated with said camera, and (2) it is referenced

(as an operand; directly in the case of a static object or

indirectly in the other two cases) by an operand in an active

query over the camera in which it appears. In Fig. 8,

Dynamic object D121 is contained within an active query

and is a QueryTarget. D102 is not associated with a query

and has the status NonQueryTargets.

The ObjectScope Privacy Filter attributes are the dif-ferent classes of objects identified by the LVDBMS and

explained previously. The TemporalScope attribute of a

Privacy Filter applies at the camera level as either None or

Permanent; we currently do not support a more granular

temporal operator.

3.8 Privacy filters applied to queries

Privacy filters may also be associated with queries. Default

privacy filters may be configured at the system level (in the

configuration corresponding to the Stream Processing

Layer host), or it may be specified when the query isinstantiated. Once a query is associated with a privacy

filter, that privacy filter is retained with the query for the

lifetime of that query. A querys privacy filter is kept in the

stream processing layer Query Executive along with other

query metadata, such as the number of sub-queries a query

has been decomposed into, which stream processing layer

hosts the query has been sent to, etc.

When applied to a query, the TemporalScope attributes

(Table 2) have only two distinct behaviors; None and

Permanent (QueryActive is treated equivalent to Perma-

nent, since both refer to the lifespan of the query).

Example 2 a Department of Transportation (DOT)

Traffic Management Center (TMC) makes available live

video feeds from cameras mounted along major roadways

for broadcast on nightly news television news segments.

The TMC provides these video feeds to allow the public to

observe traffic pattern trends in real time (such as how

quickly traffic is moving across a particular bridge), and for

news broadcasters to announce traffic incidents causing

lane blockages. However, TMC personnel do not want to

embarrass individuals involved in specific traffic incidents,

or to broadcast identifiers such as license plate numbers.

The TMC objective is to provide real-time video with ablur applied to all objects. Object types (e.g., car, truck, and

pedestrian) and color can be distinguished by observers,

but not individual identifiers such as faces and license plate

numbers (Fig. 9). (Note that only TMC personnel can

control camera functions such as zoom.) This is accom-

plished by creating a query with an Appear() operator with

corresponding privacy filter {None, None, All}. A privacy

filter must be associated with the query because by default

the query would run with the privacy filter of the user who

created the query. In the case of a TMC operator who does

not have her view restricted, the default privacy filter for

query from a TMC operator would not have a privacy filterapplied to it.

3.9 Privacy filters applied to users

Default privacy filters may also be defined that apply to

users of the system who connect through the client GUI

and can browse video cameras. When a users GUI con-

nects to the Stream Processing Layer host, the connection

is registered with the Session Manager which records

connectivity information such as client IP and port, starts a

heartbeat service, and associates a privacy filter with the

registration if necessary. The heartbeat service keeps track

of clients who are connected to the system and deallocates

resources for clients who disconnect. Clients can run que-

ries which do not have actions specified, but continuously

return evaluation results to the GUI for the user to watch.

Such queries will be aborted if the corresponding client

remains disconnected for a period of time. When applied to

a user, the TemporalScope attributes (Table 2) have only

two effective behaviors; None and Permanent (where

Permanent refers to the time the client is connected to the

LVDBMS). The other attributes behave as described

previously.

Example 3 a security monitoring application is written

for the LVDBMS. It has a predefined set of queries the

security guards, can select from, and view on their GUI.

They can also monitor any camera associated with the

system, but only see the identities of people who satisfy the

query conditions (for example, someone who has been

standing in the same place for more than 5 min). If a

security guard watches that video feed, the person who has

not moved will be unobstructed, but people walking nearby

will have their image masked. After 5 min, a query actionFig. 9 The two dynamic objects depicted here have their details

obscured by the privacy filter


123


11/21

is triggered that records the video, this query is not asso-

ciated with the security guards privacy filter and records

the entire camera view without obstructions.

3.10 Combining privacy filters

Privacy filters associated with cameras, users, and queries

must all be combined to determine which objects in whichframes of the video will have their identity obscured when

it is viewed by a user or saved to a persistent file. When a

user requests to view video from a camera, the users pri-

vacy filter is sent to the corresponding camera adapter. It is

combined with the cameras and views privacy filters (if

applicable) and then the users GUI connects directly to the

camera adapter to receive live video. In the case of a query,

the cameras privacy filter is sent as metadata along with

the image descriptor, to the corresponding spatial pro-

cessing layer host. If a query action requires a live video

stream (e.g., to save it to disk or direct it to a video

monitor), then the querys privacy filter will be pusheddown to the spatial processing layer host which is con-

nected to the camera adapter.

When two objects that have active privacy filters inter-

act, the effective privacy settings must be calculated. As we

have discussed, when privacy filters interact at multiple

levels, the effective privacy filter must be calculated. Each

attribute in the privacy filter 3-tuple has a value which is

assigned a priority (or no value is specified in which case

that attribute is not factored into the privacy calculation).

When combining attributes, the highest priority attribute is

taken, where a higher priority corresponds to more object

observations being redacted from the output video stream.

Attribute priorities are specified in the Priority column in

Tables 1, 2, and 3. When combining privacy filters, if they

have different attribute values with the same priority for the

same attribute, then the effective attribute value chosen is

the value at the next higher priority for that attribute. This

procedure must be repeated each time a new query becomes

associated with a camera object, or a query expires.

Example 4 given a camera object with privacy filter

{NonQueryTargets, QueryActive, CrossCameraDynamic}

and a query object associated with the camera {Query-

Targets, QueryActive, Dynamic}, the effective privacy

filter will be {All, Permanent, All}. That is, the priorities

are {2,2,2} and {2,2,2}. Where the tuples have different

attribute values that are of equal priority, this is reconciled

by giving the attribute the value of the next higher priority

parameter.

3.11 Tracking based upon a multifaceted object model

In order to implement the cross-camera dynamic object

operand in LVSQL, we developed a camera-to-camera

tracking technique based upon constructing an appearance

model of the objects appearing in video streams. Objects

are tracked from frame-to-frame using a traditional track-

ing technique (e.g., [32]), which we refer to as a frame-to-

frame tracker since it tracks objects within a single video

stream. When an object appears in a consecutive sequence

of frames, the frame-to-frame tracker assigns a unique

identifier to the object as a part of the tracking process. Afeature vector based upon the objects appearance is also

calculated. An object is represented as a bag of multiple

instances [6, 7], where each instance is a feature vector

based upon an objects visual appearance at a point in the

video stream. Therefore, an object can be viewed as a set of

points in the multidimensional feature space, referred to as

a point set (Fig. 10). Note that the k instances in a bag are

derived from k samplings of the object, which may not

necessarily be taken from consecutive frames.

A FIFO database to hold the multiple-instance bags of

objects recently detected by the different cameras in the

system. As new observation becomes available, the bagitself is updated by adding the new instance and removing

the oldest instance. As surveillance queries generally

concern real-time events that have occurred recently, the

FIFO database is typically very small, and in our prototype

we implemented it as a distributed in-memory database

system (distributed among spatial processing layer hosts).

Cross-camera tracking is performed as follows. When an

object is detected by a camera, its multiple-instance bag is

extracted from consecutive frames in the video stream and

used as an example to retrieve a sufficiently similar bag in

the distributed object-tracking database. If there exists

another bag sufficiently close, based upon the squared

distance metric, then the two bags are considered to cor-

respond to the same object. On the other hand, if the system

does not find a sufficiently similar bag, the occurrence of

Fig. 10 Multifaceted object representation model in which an object

is represented by its point set (i.e., feature vectors)


123


12/21

this newly detected object is considered as the objects first

appearance in the system.

To support the retrieval operations, the distributed in-

memory database needs to compute similarity between

bags. Given two bags of multiple instances:

X x!1; x!

2; . . .; x!

k

and X0 x!

01; x

!02; . . .; x

!0k

n o;

where k is the cardinality of the bags, we compute their

similarity as follows:

dm X; X0 minsi; siXmi1

x!si x!0

si

2

;

where m B k is a tuning factor and x!si x!0

si0

2

x!si x!0

si0

2

is the squared distance between the two

vectors. This distance function computes the smallest sum

of pairwise distance between the two bags. Although we

can set m = k, a smaller k value is more suitable for real-

time computation of the distance function. For instance, ifm = 1, two objects are considered the same if their

appearances look similar according to some single obser-

vation. We set m = 5 in our study. Traditionally, each

object is represented as a feature vector, i.e., a single point,

instead of a point set, in the multidimensional feature

space. This representation is less effective for object rec-

ognition. For claritys sake, let us consider a simple case in

which two different persons currently appear in the sur-

veillance system. One person wears a 2-color t-shirt with

white in the front and blue in the back. Another person

wears a totally white t-shirt. If the feature vectors extracted

from these two persons are based on their front view, the

two would be incorrectly recognized as the same object. In

contrast, the proposed multifaceted model also takes into

account the back of the t-shirt and will be able to tell them

apart. The bag model is more indicative of the objects.

The FIFO database is implemented as a series of queues

and hash tables residing in spatial processing layer hosts.

Each host maintains indices (hash tables) of the bags of

objects observed in video streams from corresponding

camera adapters. Indices associate objects with specific

video frames in which they were observed, video frames

with video streams, objects to bags, objects with queries

over the objects corresponding video streams, etc. Objects

appearing in two separate video streams will have two

separate bags in the index and two separate identifiers

(camera adapter identifier, local object tracking number,

and host identifiers concatenated into a string). If the two

objects are determined to be the same object, their bags are

merged and the index updated such that both object iden-

tifiers point to the same (merged) bag of observations.

Example 5 Cross-camera tracking allows queries to be

issued that consider multiple cameras to determine if an

event has occurred. For example, consider employees who

work in a building that does not permit smoking inside, but

has a back door and a bench next to the door for smokers to

sit. When someone comes out of the building and sits at the

bench, we assume it is an employee. When someone comes

down a nearby street and waits by the door, their motives

are unknown to us and a security guard should be notified

to observe the situation. In the LVDBMS, this can beaccomplished by creating a query over both cameras with a

cross-camera dynamic operand, to detect when someone

appears in the street camera (which does not observe the

bench or door) and then appears in a camera observing the

back door and bench. Thus, query targets are objects

appearing in first the one camera and then the second. An

associated privacy filter would obscure non-query targets

(object appearing in only one camera, or appearing in the

back door camera and later in the street view camera). The

privacy filter would be {NonQueryTargets, None, None}

and the query is Before(Appear(V1.#), Appear(V2.#))

where V1 shows the street and V2 the back door.

4 Evaluation

This section describes the experimental conditions in

which the LVDBMS software was evaluated and the

experimental results. The focus of the LVDBMS is in real-

time environment, as opposed to a system that operates

only with pre-recorded video. Thus, periodic activities such

as the amount of time required to evaluate a query must be

less than the frequency in which queries are evaluated.

Otherwise query evaluation queue will grow unboundedly

and query results will not be returned in a timely fashion.

In addition, because video streams entering the system

from cameras are unbounded (the camera can always be

turned on and transmitting video), only a small amount of

data can be retained within a sliding window before it must

be discarded to provide room for new data. Thus, imple-

menting privacy protection in real time is different and

challenging from doing so off-line because (1) the time

required to carry out data processing operations is bounded.

And (2) due to storage limitations, only a small portion of

the video data may be retained at any particular time in its

raw (unsummarized) format and must be processed in one

pass through reading through the data. In off-line pro-

cessing, the video data is stored and can be processed with

multiple passes over the data, for example, to create an

index structure to be used in a later processing stage.

4.1 Experimental setup

To test the effectiveness of the LVDBMS, we utilize three

sets of videos, where each video set satisfies a different


123


13/21

objective. We number the data sets from 1 to 3 as follows:

(1) We created a series of reference videos by placing

cameras in three locations in a campus building, inside two

laboratories (rooms) and in a hallway. Each laboratory has

slightly different lighting with no external windows, and

the hallway has exterior windows along one wall. This

provides reference videos with changing lighting condi-

tions, and the subjects at times are obscured by desks,chairs, and tables. This creates a challenging environment

to track objects from one camera to another. This series of

videos involved 5 people, with on average 2 or 3 people

appearing in the field of view at any particular time. (2)

Videos from the CAVIAR project (http://homepages.inf.ed.

ac.uk/rbf/CAVIAR/) are utilized. These are low-resolution

videos that provide video coverage of the same scene from

two different views, font and side. This is challenging

because the video resolution is small by todays standards,

and the objects appearing in the videos have relatively few

pixels to contribute toward building the appearance model

(bag of feature vectors). (3) We created a series of videosrecording traffic on roads (cars, trucks, and a few pedes-

trians were observed). Automobiles are rigid objects that

do not change shape while we observed them driving and

move in patterns (constrained by the road). This scenario

provides an excellent testbed to test spatial operators, such

as Appears, North, West, etc. with approximately perfect

tracking accuracy within a video stream.

We evaluated the LVDBMS with pre-recorded videos so

that the same conditions could be simulated with different

configuration parameters. In our LVDBMS implementa-

tion, image frames are presented to the camera adapter in

one of two ways: an initial processing thread either reads

the image data from a memory buffer that is written to by a

device driver for the camera hardware, or the image data

are extracted from a video file by a video codec. Once the

frame is extracted, it is enqueued in a new frame queue. A

second processing thread retrieves frames from the new

frame queue and proceeds to identify foreground pixels

from background pixels, etc. Once a frame of image data

has been enqueued on the new frame buffer, throughout its

lifetime it is indistinguishable whether or not the frame was

extracted from a live camera or from a pre-recorded video

file. Therefore, once the frame has been received from its

source, the frames original video source is indistinguish-

able to the rest of the system processing pipeline and has no

effect on query processing or other system behavior.

For these experiments, the frame-to-frame tracker is

configured to ignore detected objects less than 200 pixels in

area, which we will consider noise (but this parameter is

configurable at the camera adapter level). Software in all

tiers takes configuration settings from XML files, and to

facilitate scripting, also accepts command line arguments.

LVDBMS core components are implemented in C# and

utilize Language Integrated Queries (LINQ) to maintain

some internal queues and hash indexes. For the experi-

mental results presented in this article, the LVDBMS ser-

ver layers ran on a Windows 7 computer with 3 GHz

Pentium IV with Hyper threading CPU and 3GB RAM

(Dell Precision 370); and the camera adapters on a

2.54 GHz Core 2 Duo Latitude E6500 with 4GB RAM. For

the eight camera experiment discussed in Sect. 4.4.1, thecamera adapters were hosted on a Windows 7 HP Pavilion

laptop with a 2.3 GHz Quad Core CPU and 4GB RAM. We

use Emgu CV, a .NET wrapper for the Intel OpenCV

library, which is utilized for low-level visual operators in

conjunction with the Intel Integrated Performance Primi-

tives (IPP) library.

4.2 Effectiveness of privacy filters

The purpose of privacy filters are to obscure qualified

objects in videos from being identified. In this section, we

provide an example of how an object that is obscured bya privacy filter is displayed to users via the LVDBMS

client. Figure 11 illustrates two separate situations where

privacy filters obscure the identification of detected

objects. On the left-hand side, a person is walking toward

a door, and on the right-hand side a vehicle is traveling

down a street in a traffic-counting application. In these

examples, objects are obscured by blurring pixels con-

tained in the bounding boxes. Applying a blur maintains a

visually appealing image in which obscured objects do

not significantly stand out, but other options, such as an

adaptive blur based upon the side of the bounding box, or

simply setting the entire rectangle to a solid color such as

black, are other options depending upon how much the

appearance of the image should be obscured from the

video stream (and is not a focus of this work). Additional

privacy-preserving measures such as increasing the size of

the box to mask the size and shape of the object being

protected is another option with the tradeoff of decreasing

the utility of the video (as more of the video is obscured

to the viewer).

Privacy filter effectiveness is a function of the effec-

tiveness of the object detection logic, and depending upon

the query, the tracking logic. In the left image in Fig. 11,

the person walking satisfies query condition Appears(), as

well as false positives (FPs) identified by the background

segmentation algorithm due to a person walking through a

door and the closing. In this case, the relevant objects

identity is obscured, as well as four FP areas (which can be

reduced by adjust camera parameters), and the privacy

condition is satisfied. In this section, we do not provide a

separate table of privacy evaluation results, because had

the queries presented in Table 4 had privacy filters enabled,

the privacy filter effectiveness would have exactly the same


123
http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/


14/21

result presented in the third column of Table 4, tracking

accuracy.

The accuracy of privacy filters correctly obscuring the

appearance of an object from a video stream is related

directly to the specification of the privacy filter, and if it is

associated with a query, the query. For example, a privacy

filter that obscures all objects in a video stream is depen-

dent upon the object segmentation algorithm to correctly

identify objects from the video background. Todays

background segmentation algorithms are very accurate, and

in the cases where they error (such as complex moving

backgrounds), the errors would be incorrectly consideredobjects and have their appearances obscured. Likewise,

tracking algorithms which track objects viewable in con-

secutive frames of video in a single camera are also

accurate, but less accurate than the simple foreground/

background extraction. Similarly, tracking objects from

one camera to another is a more difficult problem, and as

indicated in Table 4, yet less accurate. If two objects

visually appear the same in two cameras, it is difficult

problem to determine if they are in fact the same object,

without an additional aid such as a security ID card with an

RFID. In order to maximize the privacy in these later sit-

uations, one could construct a privacy filter that obscures

all visible objects (rather than filtering based upon query

target or non-query target), thus minimizing the effect of

query accuracy. If an object has an appearance that is

sufficiently similar to the background, then it would not be

detected by the background segmentation algorithm, andwould not have a privacy filter applied to it. As soon as it

were to move or its appearance change such that it looked

sufficiently distinct from the background around it, it

would be recognized as a salient object by the LVDBMS

and any applicable privacy filters would apply to the

object.

4.3 Privacy filter demonstration

This section provides several scenarios showing the

application of privacy filters. The first demonstrates a

transportation application in which a Transportation Man-agement Center (TMC) operators terminal and a live news

feed originate from the same traffic camera. The query,

Appears(v1.*, 250), which monitors for objects in the video

stream sized 250 pixels or greater, is active on the camera,

v1. The live news feed is served through a view which has

associated with it a privacy filter that specifies that query

targets should be obscured, illustrated in Fig. 12.

Figure 13 shows a screenshot of the traffic camera video

feed as the TMC operator would observe it. Through the

LVDBMS they are viewing images from the camera and

are not associated with any privacy filters. The live video

provided for television, however, has access only to obtainvideo from the view, view1. This view has a privacy filter

associated with it, which applies to all objects that are

query targets, that is, which might contribute to a query

evaluating to true. This privacy filter has an effect only

when a query is active (in this example the query monitors

for objects appearing in the video stream which are larger

than a specified area in pixels). Figure 14 provides an

example of video observed through view1 with the query

active.

Fig. 11 Examples of objects

identities obscured by privacy

filters. The left image is from

data set (1) and right (3)

Table 4 Continuous query evaluation results

Query name Description Accuracy

Appear True if objects with area greater

than 100 pixels appears in the

frame, else false

100%

North before

south

True if there exists an object is

moving with downward motion.

Before operator has a windowsize of 20 frames; if the object

stops or changes direction for less

than 20 frames it is still

considered true

100%

Appear across

cameras

A person appears in camera 1 and

then is recognized when they

appear in a second camera

83%

(TP = 20,

FN = 4)

Appear, then

cover across

cameras

An object appears in camera 1 then

goes through a door (outlined by

a static object) in the second

camera

91%

(TP = 22,

FN = 2)


123


15/21

The next example (Fig. 15) demonstrates privacy filters

at multiple levels of the LVDBM hierarchy. A name plaque

is mounted in a corridor and must be obscured in video

streams sent to all consumers of this video source. To

accomplish this, a static object is defined over the plaque and

a privacy filter is associated with the camera (Fig. 16). This

camera-level privacy filter will be propagated to all con-

sumers of this video stream and factor in the effective pri-vacy filter computation for each consumers video stream.

The camera is accessed by two users, User 1 and User 2.

User 1 is not explicitly assigned a privacy filter, and User 2

has been assigned a privacy filter that is applicable to all

objects of type dynamic. (For example, User 2 might only

need to recognize general activity, and notify a supervisor,

User 1, when a closer review is required.) When the video

is viewed from User 1s terminal, the camera-level privacy

filter is propagated and applied to the static object drawn

around the plaque. The result is the plaque is obscured in

the video output on User 1s terminal, Fig. 17. User 2 is

explicitly assigned a privacy filter that applies to alldynamic objects in the video stream. The privacy filter on

dynamic objects is propagated to User 2 by the LVDBMS

automatically. When the frames are rendered into the video

stream for User 2, the privacy filters are combined (per the

discussion Sect. 3.10) and the effective privacy filter

applies to all objects identified in the video stream, as

illustrated in Fig. 18. Both terminal images in Fig. 18 are

from the same video stream, but illustrate two different

system configuration settings, depending upon how much

detail might want to be revealed about obscured objects.

The upper image has a blur operator applied, which would

blur identifying features while providing the operatorsubstantial visual information to observe behaviors. The

lower terminal image simply applies a bounding box that is

the average pixel color for the region to obscure.

4.4 Query evaluation accuracy for event monitoring

tasks

Two important aims of the LVDBMS are overall usability

and the ability to specify privacy policies in terms of

Relevant Object Privacy Filter

Camera None

User 1: TMC Operator None

User 2: News station

live video feed

None

View1 Target=QueryTargets

Fig. 12 Transportation

example showing a video

source, v1, providing live video

which is consumed by a TMC

operator and live television. The

table (right) shows privacy

filters associated with various

objects in this scenario

Fig. 13 Unmodified video originating from camera, v1, and as

viewed by the TMC operator who does not have a privacy filter

Fig. 14 Live video as viewed through the view view1 with privacy

filter and active query


123


16/21

privacy filters. To be relevant to surveillance applications,

the ability to define queries which detect noteworthy events

is important. In addition, privacy, and the ability to main-

tain some level of privacy for objects (i.e., people, identi-

fiable automobiles, etc.) as surveillance systems become

more automated and pervasive, is also important. Thus, in

this study, we redesigned the LVSQL query language to be

more concise and easier to use. Query accuracy is the

accuracy of detecting user-posed events of interest by

the LVDBMS. In the experiments in this section, we test the

LVDBMS to correctly interpret and evaluate four continu-

ous queries their results, i.e., if the conditions in the video are

true when the query indicates a true condition. Query results

are tabulated by manually monitoring the videos taken from

dataset (1) and query result in the LVDBMS GUI and

indicating the result every 5 s (by incrementing TP, TN, FP,

FN). Each query is evaluated over a 2-min period.

Relevant Object Privacy Filter

Camera ObjectScope=Static

User 1 None

User 2 ObjectScope=Dynamic

Fig. 15 Surveillance scenario

illustrating the application of

multiple layers of privacy filters

to different types of objects

Fig. 16 Unobscured view from camera indicating mounted name

plaque

Fig. 17 Video stream, as observed by User 1, with static object

obscured with a solid pattern. The manor in which an object is

removed from the video stream (solid box or blur) is configurable

Fig. 18 Video stream, as observed by User 2, with blur (upper) and

solid (below) patterns obstructing objects from the video stream


123


17/21

As expected, the accuracy for queries involving a single

video stream is extremely high. The accuracy of queries

that correlate objects across multiple camera views is

related to the accuracy of the underlying cross-camera

tracking infrastructure, as reflected in the two cross-cameraquery experiments. The short (2 min) experiments allow

only a few instances of each object to be observed and

reflected in the index, however, with only a few bags

representing objects in the index, there were no FPs or false

negatives (FNs) that could be attributed to mis-associating

an object in one video with the wrong object in the other

video. To determine query evaluation performance, we

constructed four queries, two single-camera queries, and

two multi-camera queries, and present the results in

Table 4.

4.4.1 Query processing performance

The resolution of a continuous query is the frequency in

which it is evaluated. In order to be usable in a real-time

system, query processing must be completed within a

bounded amount of time for query evaluation to not

become backlogged (and thus out of sync with the video

image a user might observe) with respect to frames from

streaming video, index updates, etc. We evaluate the per-

formance of the query processing engine by simultaneously

evaluating five queries for a period of 120 s over a random

selection of ten videos. Figure 19 provides the results of

evaluating the five continuous queries over each video withresults combined into a single plot. Table 5 presents

summary metrics for the data plotted in Fig. 19 normalized

for each query (divided by five). The dataset the video

came from is indicated after the videos name in the table.

Note that the cost to evaluate a query is a function of the

input to its respective operators; some operators, such as

AND and OR implement short-circuit evaluation and only

evaluate the second argument if the value of the first is

insufficient to determine the operator result. The data

reported in Table 5 is based upon five queries and one

video. We repeated this experiment with eight simulta-

neous videos, and the performance results were relatively

unchanged from those in Table 5. For the resolution of

video utilized for this experiment, 8 is the maximum

number of camera adapter instances that could be ran on a

4-core host without the frame processing rate dropping to

an unacceptable level (we consider approximately five

frames per second manageable, but lower processing frame

rates could lead to object segmentation and tracking errors,

for example). Once the video stream has been processed by

the camera adapter, the corresponding spatial processinglayer host receives a stream of object size and position

updates, and video frames. The frame-to-frame tracking

and background segmentation processes which occur in the

camera adapter processing pipeline is the most CPU

intensive stage in the data flow in the LVDBMS system.

Compared to the video data received by the camera

adapters, the quantity of data that flows to the spatial

processing layer, and then to the stream processing layer, is

substantially less at each phase. The spatial processing

0

20

40

60

80

100

120

140

160

180

1 815

22

29

36

43

50

57

64

71

78

85

92

99

106

11

3

CPUTim

einMilliseconds

Elapsed Time in Seconds

Query Evaluation Cost OneShopOneWait1front

ShopAssistant2cor

SR436_M2U00040

TwoEnterShop1cor

TwoEnterShop1front

TwoEnterShop3cor

TwoLeaveShop1cor

TwoLeaveShop2cor

Walk2

WalkByShop1cor

Fig. 19 Cost to evaluate five

simultaneous queries in terms of

CPU time

Table 5 Average query evaluation cost in terms of CPU, per query in

milliseconds

Movie Performance

Min Max SD Avg

SR436_M2U00040 (3) 0.40 5.60 0.73 0.78

OneShopOneWait1front (2) 0.40 30.81 6.26 4.58

ShopAssistant2cor (2) 0.40 21.24 2.72 2.49

TwoEnterShop1cor (2) 1.60 12.00 1.68 2.42

TwoEnterShop1front (2) 3.40 26.60 3.05 5.97

TwoEnterShop3cor (2) 0.40 7.00 0.90 0.87

TwoLeaveShop1cor (2) 1.40 15.60 2.75 3.47

TwoLeaveShop2cor (2) 0.40 14.00 1.86 1.38

Walk2 (2) 0.40 1.80 0.40 0.72

WalkByShop1cor (2) 0.40 3.00 0.49 0.73


123


18/21

layer performs index updates and query evaluations, which

is not CPU intensive, and in turn sends sub-query evalua-

tions to the stream processing layer at the resolution of

each query (e.g., once each second).

Data presented in Fig. 19 and Table 5 show results from

a mixture of queries which evaluated with within a period

of time well below the query resolution of 1 s, for the

evaluation period. Query evaluation entails computingoperator values which require operand lookups within

index structures and finally updating metadata for objects

to indicate query targets. What we want to emphasize with

these results are that on average, over a wide variety of

input videos, query execution on average is well below the

1 s query resolution. Had query execution exceeded 1 s,

query results would be out of sync with video frames

presented to the user via the client.

4.5 Multi-camera object tracking for privacy filter

correctness

This section provides camera-to-camera tracking results

from our tracking technique based upon a multifaceted

object model based upon an objects appearances in video

streams. An essential feature of the privacy framework is

the ability to construct a query and use it to either include

or exclude dynamic objects from a privacy filter. Thus,

object tracking and cross-camera object tracking (when a

privacy filter or corresponding query is formulated to make

use of such functionality) correlate positively with privacy

filter accuracy.

Figures 20 and 21 present accuracy results from two

sequences of videos, from dataset (1), that involve

tracking people across a three cameras setup in a labo-

ratory environment as described in Sect. 4.1. In order to

maximize the number of results to present, for this section

we query the index for each observation of each object in

each frame of video. That is, for each frame, we query the

index for the 1st nearest neighbor from the query point

(i.e., the objects feature vector) and return the result if it

is sufficiently close else nothing is returned. If a result is

returned, if the result is the correct object, true positive

(TP) is incremented, else false positive (FP) is incre-

mented. Likewise if no result is returned, true negative

(TN) is incremented if the object is not in the index else

we increment FN. Next, the bag corresponding to the

object is updated to include the currently queried instance

(based upon the cluster identifier assigned by the frame-

to-frame tracker). This process is repeated for each frame

in the video. We present accuracy, measured in terms of

precision and recall:

Accuracy TP

FP FN TP:

(The accuracy equation does not consider TN because if an

event does not occur it will not be detected. Furthermore, if

an event does not occur and we claim that it did occur, that

is considered a FN which is a factor in the equation.) As we

see from the accuracy indicated in Figs. 20 and 21, initially

the feature space is sparse and the bag representations

contain few points (and thus small corresponding standard

deviations along the various dimensions). As more obser-vations are added to the bags in the index, the bag repre-

sentations become more indicative of what we are likely to

observe of a particular object in the future, and the accu-

racy stabilizes. The object-tracking technique we present is

based upon the visual appearance of an object, and when

more objects. Even though a FIFO queue is utilized to limit

the duration of time an object taken into consideration, for

tracking purposes, when many objects appear in video

streams simultaneously, the likelihood increases that some

of the objects will look sufficiently similar that they may be

mistaken for one another, resulting in decreased accuracy.

5 Related work

An LVDBMS encapsulates work from a multitude of

domains including continuous query languages development,

Fig. 20 Cross-camera tracking accuracy for sequence 44

Fig. 21 Cross-camera tracking accuracy for sequence 46


123


19/21

computer vision techniques such as object detection and

tracking. For completeness, we include a review of recent

video surveillance related topics.

5.1 Privacy considerations

As cameras become pervasive, improved video surveil-

lance systems will be required to overcome the limitationsimposed by direct and continuous human monitoring. This

will result in increasing volumes of video that is processed,

published, monitored, and stored. References [5, 10] sug-

gest that privacy is a function of what is deemed culturally

and socially acceptable by society. Several privacy-aware

systems have been developed which can detect movement

and mask it for privacy considerations. For example, in

[27] pedestrians are obscured with multicolored blobs,

where color specifies a status, such as crossed a virtual

trip wire. Reference [15] develops an MPEG-4 transcoder

and decoder to mask objects in a video stream based upon

movement. While these systems increase privacy bymasking the objects identity, they are not helpful in fight-

ing crimes because the obscurity is irreversible. Further-

more, they do not provide functionality to determine if an

object should indeed be masked in the output video stream.

Large collections of data provide data mining opportu-

nities for discovering global trends, decision making,

capacity planning, building machine learning classifiers,

etc. Data in its original form, such as hospital patient

demographic data, contains information that violates indi-

vidual privacy. Privacy-preserving data publishing (PPDP)

proposes algorithms to make data available for mining

global trends while preserving individual privacy. These

techniques range from monitoring the individual queries

issued, to perturbing the data in various ways. For example,

an attacker might try to identify a patients record in a

public data set.

The majority of research on privacy control methods

focuses on statistical databases containing tabular data.

Security control methods generally entail query restric-

tions, data perturbation, and output perturbation [2]. Query

restrictions entail monitoring queries, e.g., the number of

queries submitted by a particular user, the amount of

overlapping data that is queried by user, etc. Data pertur-

bation entails modifying data values stored in the database,

such as replacing the age values of people with the average

age by zip code. Output perturbation involves injecting

error into the query result. Thus, there is a tradeoff between

accuracy and confidentiality, inducing higher error results

in a lower likelihood of identifying particular data values

but results in more skewed aggregated results.

When privacy filters are applied to video streams, the

effect is a type of data perturbation. In an ideal scenario,

the modified streams should not reveal anything about the

individuals appearing therein [5]. However, [8] has shown

that an absolute guarantee of privacy is unachievable in the

presence of external auxiliary information. Some recent

works, such as [26], investigate identity leakage through

implicit inference channels, such as time of day combined

with camera location. For example, if a camera shows an

office door and one observes a blurred figure entering at 8

a.m. and leaving at 12 p.m. one can assume the obscuredperson and the person assigned to that office are the same.

Thwarting this type of attack on privacy is beyond the

scope of the method we propose in this study. Our primary

aim is to make objects appearing in a video stream indis-

tinguishable from one another in accordance with the

current privacy specification. We note, however, that with

our framework, identifiers such as office door numbers and

placards can be defined as static objects and an appropriate

privacy specification can be defined to redact them from

the output video stream.

In this study, we present a flexible privacy framework

which has the goal of protecting individual privacy whileproviding data streams that can be queried for events as

accurately as possible. Thus, we choose to perturb the

output data in some ways (i.e., obscure objects with

bounding boxes of varying degrees of tightness) but not

others (such as skewing the video in the time domain,

adding ghost objects to hide when real ones appear,

etc.).

5.2 Object detection and tracking

There are many existing video surveillance systems over

networked cameras, e.g. [3, 12]. In particular, object rec-

ognition and tracking is a core component of these systems,

forming a basis for high-level analytic functions for scene

understanding and event detection. Since cameras have a

limited resolution and field of view, multiple cameras may

be required to provide coverage over the area of interest.

Typically, the fields of view of adjacent cameras may not

overlap due to economics, the environment, or computation

constraints. These practical factors place a great challenge

on tracking objects moving across multiple cameras.

Existing multi-camera tracking environments [14, 18,

19, 21, 22, 28, 30, 32] require various types of calibrations

and/or information on the spatial relationships between the

various cameras, to be configured into the system as known

parameters. They assume overlapping fields of view of the

cameras, or non-random movement patterns. In the latter

scenario, when an object moves from the field of view of a

camera into that of the next camera, this object can be

recognized in the second camera by taking into consider-

ation the speed and trajectory of the object when it exits the

field of view of the first camera [21, 22]. This strategy is

only applicable to non-random movement patterns such as


123


20/21

objects constrained by roads, walls, etc., and cannot be

used for general-purpose a

A General Network for Managing and Processing Live Video Data With Privacy Protection

Documents

Transcript of A General Network for Managing and Processing Live Video Data With Privacy Protection