A General Network for Managing and Processing Live Video Data With Privacy Protection

download A General Network for Managing and Processing Live Video Data With Privacy Protection

of 21

Transcript of A General Network for Managing and Processing Live Video Data With Privacy Protection

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    1/21

    R E G U L A R P A P E R

    A general framework for managing and processing live video datawith privacy protection

    Alexander J. Aved Kien A. Hua

    Springer-Verlag 2011

    Abstract Though a large body of existing work on video

    surveillance focuses on image and video processing tech-niques, few address the usability of such systems, and in

    particular privacy issues. This study fuses concepts from

    stream processing and content-based image retrieval to

    construct a privacy-preserving framework for rapid devel-

    opment and deployment of video surveillance applications.

    Privacy policies, instantiated to as privacy filters, may be

    applied both granularly and hierarchically. Privacy filters

    are granular as they are applicable to specific objects

    appearing in the video streams. They are hierarchal because

    they can be specified at specific objects in the framework

    (e.g., users, cameras) and are combined such that the dis-

    seminated video stream adheres to the most stringent aspect

    specified in the cascade of all privacy filters relevant to a

    video stream or query. To support this privacy framework,

    we extend our Live Video Database Model with an infor-

    matics-based approach to object recognition and tracking

    and add an intrinsic privacy model that provides a level of

    privacy protection not previously available for real-time

    streaming video data. The proposed framework also pro-

    vides a formal approach to implement and enforce privacy

    policies that are verifiable, an important step towards pri-

    vacy certification of video surveillance systems through a

    standardized privacy specification language.

    Keywords Query language Privacy framework Video database system Real-time Object recognition Object tracking

    1 Introduction

    Camera networks have been the subjects of intensive

    research in recent years, and can range from a single

    camera to a network of tens of thousands of cameras.

    Usability is an important contributing factor to the effec-

    tiveness of such networks. As an example, a camera net-

    work in London costs 200 million over a 10-year period.

    However, police are no more likely to catch offenders in

    areas with hundreds of cameras than in those with hardly

    any [29]. This phenomenon is typical for large-scale

    applications and can be attributed to the fact that the

    multitude of videos is generally not interesting and con-

    stant manual monitoring of these cameras for occasional

    critical events can become fatiguing. Due to this limitation

    in real-time surveillance capability, cities mainly use video

    networks to archive data for post-crime investigation.

    To address operator fatigue and increase the effective-

    ness of the camera network, automatic video processing

    techniques have been developed for real-time event

    detection under various specific scenarios. These systems,

    such as the live video database discussed in this study, can

    monitor live video streams in real time for events of

    interest and can alert human operators upon their detection.

    However, pervasive monitoring by corporate and govern-

    mental entities can lead to privacy concerns. For example,

    archived video from a police camera network could later be

    used for purposes other than what for which it was origi-

    nally collected. Public information laws could make video

    footage collected for legitimate purposes available to

    anyone who requests it. In a corporate setting, cameras

    deployed to record customer behavior might capture

    employees after their work shift has ended.

    Deploying a sizable camera network entails a significant

    monetary investment. For aesthetic reasons it is also

    A. J. Aved (&) K. A. HuaUniversity of Central Florida, Orlando, FL, USA

    e-mail: [email protected]

    K. A. Hua

    e-mail: [email protected]

    123

    Multimedia Systems

    DOI 10.1007/s00530-011-0245-x

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    2/21

    desirable to minimize the number of cameras deployed

    (e.g., in a historic district of a town). For reasons such as

    these, it is would be beneficial if multiple entities could

    share access to the cameras. Police could monitor for

    suspicious activity and crime scene evidence, utility com-

    panies could gauge outages after inclement weather, and

    sanitation departments could assess the productivity of new

    employee vehicle operators, to name just a few possiblecollaborators. Possible benefits entail shared deployment

    costs, and the possibility of providing service to stake-

    holders who otherwise could not justify the expense of a

    single-purpose camera network deployment on their own.

    However, shared usage and monitoring by indeterminate

    or changing interests could lead to significant privacy

    concerns. Thus, to address the usability and privacy con-

    cerns of a general-purpose camera network, three factors

    are important:

    1. The software system must support ad hoc monitoring

    tasks. An event of interest in one domain such astransportation monitoring is generally different from

    another domain such as crime prevention. Events of

    interest can also vary significantly between individual

    users.

    2. It is desirable to provide the capability to enable rapid

    development of customized applications for domain-

    specific users, such as counting cars along a section of

    highway, or informing an employee when a nearby

    conference room becomes available early.

    3. People are concerned about privacy and there are

    increasing objections to pervasive monitoring. For

    applications that run atop a camera network, there is aneed for policies which specify the level of privacy

    they adhere to, and mechanisms which implement said

    policies.

    To achieve the first two factors, we have designed and

    implemented a general-purposed Live Video Database

    Management System (LVDBMS) [25] as a platform to

    facilitate live video computing. It allows automatic moni-

    toring and management of a large number of live cameras.

    These cameras in the network are treated as a special class of

    storage with the live video streams viewed as database

    content. The user is able to specify an ad hoc video moni-

    toring task by formulating a query to describe a spatiotem-

    poral event. When the event occurs and is detected by a

    monitoring query, an action associated with the query is

    executed. This general-purposed LVDBMS also enables

    rapid development of live video applications much like

    database applications are developed atop standard database

    management systems today. Another work that allows the

    user to specify semantically high-level composite events for

    video surveillance is presented in [31]. However, this tech-

    nique requires the user to formulate queries procedurally

    using low-level operators. In contrast, our query language is

    declarative. The user only defines the event and the system

    automatically generates the corresponding query processing

    procedure.

    To address the third factor mentioned previously, we

    present in this article a privacy framework for the

    LVDBMS. This framework implements a privacy speci-

    fication language (PSL) that permits privacy policies to bespecified, and enforced by removing identifiable infor-

    mation pertaining to objects from the video streams as

    they are made available externally by the system.

    Example consumers of streaming video could be a file for

    storage, an operators video terminal, or a live video feed

    captured for use in a traffic report on a television news

    broadcast. This facility allows the user to specify various

    privacy views atop the raw video stream to remove the

    objects from the output stream, which has the aim of

    protecting individual privacy while retaining general

    trends evidenced in the video stream such that further

    scene analysis is meaningful. Here objects refer to aperson, animal, or vehicle that is not part of the video

    background. These objects are characterized using a

    multifaceted object model. As part of the new extension

    to our prototype, we also introduce in this article an

    informatics-based approach to cross-camera tracking

    technique. This scheme permits queries to be defined that

    span multiple video streams.

    The remainder of this article is organized as follows. In

    Sect. 2, we give an overview of our LVDBMS to make the

    article self-contained. The proposed privacy framework is

    introduced in Sect. 3, with experimental results in Sect. 4.

    Related work is discussed in Sect. 5. Finally, we conclude

    this article in Sect. 6.

    2 LVDBMS environment

    In this section, we briefly introduce LVDBMS, a distrib-

    uted video stream processing Live Video Database (LVD)

    environment, and refer the reader to [25] for further details.

    Large networks of cameras provide a proliferation of

    multimedia information, and there is a profound need to

    manage and organize the volumes of video data into pro-

    portions relevant for human consumption. In an environ-

    ment with numerous video cameras, the goal is to provide

    human operators with a facility to specify relevant scenes

    of interest, minimizing exposure to uninteresting and

    irrelevant scenes. However, today there is a technological

    gap between what state of the art software technologies can

    provide in terms of identifying rich content, and consumer

    expectations. The LVDBMS allows users to mine numer-

    ous video streams for events of interest and perform actions

    when the specified scenarios are encountered.

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    3/21

    2.1 LVDBMS architecture

    The LVDBMS is a distributed video database management

    system. Operators interact with the highest layer, and view

    video streams and query results through computer terminals

    (Fig. 1). Figure 2 illustrates the fourlogicaltiers implemented

    in the LVDBMS architecture which communicates through

    web services interfaces. Multiple cameras in the camera layer

    may be associated with a single host in the spatial processing

    layer. Spatial processing layer hosts perform real-time motion

    imagery processing techniques for abstract object recognition

    and partial query evaluation (dependent upon available data at

    the host). Intermediate query results are streamed to a host in

    the stream processing layer which periodically computes the

    final query result which is made available to clients in the

    client layer. We discuss these softwarelayers in more detail in

    the following sections.

    2.1.1 Camera layer

    The camera layer, Fig. 3 left, consists of physical devices

    that capture images. Each camera is paired with a camera

    adapter which runs on a host computer. We do not assume

    that the cameras have built-in analytical capability other

    than capturing images and making them available to a

    corresponding camera adapter.

    The camera adapter allows any type of camera to be used

    with the LVDBMS using relevant drivers. When processing

    a scene, the camera adapter first performs scene analysis on

    the raw image data to determine background pixels and

    foreground objects. The segmented objects then flow into

    the frame-to-frame tracking module, which tracks objects

    within consecutive frames in a cameras field of view and

    assigns them an object number (unique within the camera

    adapter). For each image a bag of feature vectors is calcu-

    lated (a bag is similar to a set but allows for duplication).

    Once each frame from the camera is processed, it is

    bundled as an image descriptor and sent to the spatial pro-

    cessing layer for query evaluation. The image descriptor

    may contain the actual image bitmap if specifically

    requested by the spatial processing layer host, but otherwise

    it contains only image meta data, such as the identifiers and

    locations of objects identified within the frame, the corre-

    sponding feature vectors, frame sequence number, etc.

    2.1.2 Spatial processing layer

    Spatial processing layer hosts evaluate spatial and temporal

    operators over the streams of image descriptors and provide

    Fig. 1 LVDBMS hardware

    architecture

    Fig. 2 The LVDBMS is logically grouped into three layers plus a

    client layer

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    4/21

    a result stream to the stream processing layer (Fig. 3 mid-

    dle). A server hosting the spatial processing layer will ser-

    vice many cameras, but a camera adapter will be associated

    with only a single instance of the spatial processing layer.

    Server replication at this layer allows the LVDBMS to scale

    to an arbitrarily large number of video streams.

    2.1.3 Stream processing layer

    The stream processing layer (Fig. 3 right) accepts queries

    submitted by clients and partial query evaluation results

    from the spatial processing layer. It does not interact

    directly with cameras or their adapters. We note that we

    can have replication at the stream processing layer for and

    fault-tolerance. Queries are decomposed into an algebraic

    tree structure which is partitioned by the host and pushed

    down to relevant servers in the spatial processing layer. As

    sub queries are evaluated, results are streamed back to the

    Stream Processing Layer where they are combined and the

    final query result computed. Sub query results may arrive

    out of order, get lost in the network, or a camera may

    unexpectedly go offline and must be gracefully handled.

    2.1.4 Client layer

    Users connect to the LVDBMS and submit queries using a

    graphical user interface, depicted in Fig. 4. The client

    allows users to browse available video cameras, and define

    and submit queries and review results.

    2.2 LVDBMS data model

    A query is a spatiotemporal specification of an event, posed

    over video streams, and expressed in Live Video SQL

    Fig. 3 Software layers of the LVDBMS and major components contained therein

    Fig. 4 The LVDBMS client

    allows users to browse cameras,

    construct queries, and send

    system commands

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    5/21

    (LVSQL), a structured query language. A query defines

    which streams will be accessed and what information will

    be returned. In an LVD, information is contained in live

    video streams which are inputted to the LVDBMS in real

    time. The fundamental construct in an LVD is an object,

    which is either indicated by the user (a static object) or

    automatically detected (dynamic object).

    We provide a brief description of the LVD data modeland refer the reader to [25] for additional details. A video

    stream consists of temporally ordered frames where each

    frame represents a snapshot of what was detected by an

    image sensor at a particular time. An object, then, is some

    real-world physical entity whose image was captured and is

    represented in the frame. (For our definition of object, we

    do not consider the background to be an object it is spe-

    cifically indicated in that query.) There are two types of

    operators in LVDs: spatial and temporal. Spatial operators

    are formulated over objects that are visually captured in

    video streams (i.e., overlaps, meets, disjoint, exists, etc.).

    Temporal operators evaluate the temporal relationshipsbetween spatial events (i.e., before, during, etc.). When

    constructing LVSQL statements, there are three types of

    objects that may be referenced:

    1. Static objects are indicated by the user by drawing an

    outline on a frame captured from the video stream, at

    query submission time.

    2. Dynamic objects the objects that appear in a video

    stream and are not part of the background. They may

    be specified as an asterisk (*) in the query.

    3. Cross-camera dynamic objects dynamic objects

    detected in a camera and matched with an object thatwas (or is) viewed in another camera. In the query

    language, these objects are denoted with a pound sign

    (#), e.g., Before (Appear(V1.#), Appear(V2.#), 120),

    which queries for an object that appears in stream V2

    within 120 s of appearing in V1.

    As events occur in real time, queries must be resolved in

    real time as well. Queries that require temporarily storing

    historical video data are always parameterized such that the

    data that must be retained for query evaluation is always

    contained to a temporal sliding window. The resolution of a

    query refers to the frequency with which a query is evaluated.

    For example, a query that is evaluated five times each second

    is of finer resolution than a query evaluated each second.

    2.3 LVDBMS query language

    The LVSQL query language is used to pose continuous

    queries over events that occur in live video streams in real

    time. The essential form of an LVSQL query is as follows:

    ACTION \Action[

    ON EVENT \Event[

    where\Event[ syntax is expanded upon in Fig. 5. In

    LVSQL, spatial operators take objects as arguments, and

    temporal operators take output of spatial operators (i.e., bit

    streams) as arguments. Boolean logic combined with the

    various temporal and spatial operators results in a very

    expressive language capability.

    All queries must involve a spatial operator; the simplest

    query expressible could check for the existence of any

    object appearing in the field of view of a particular camera.

    This spatial query could then be enhanced with a temporal

    component, for example, duration: trigger an alarm if an

    object appears and persists for longer than 10 min; or two

    spatial operators could be combined with a temporal

    operator: alert if an object contacts a particular desk, then

    walks through a door.

    3 Privacy filter framework

    In this section, we introduce our privacy framework and

    explain how it is applied. The implementation details are

    discussed in Sect. 4. While members of the public gener-

    ally accept having their image recorded by cameras, it is a

    violation of their trust to use their data for purposes people

    may find intrusive, or to have their image used for reasons

    contrary to the known usage of the cameras. Examples of

    Fig. 5 LVSQL event

    specification syntax

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    6/21

    intrusive uses could be a security guard observing shoppers

    for personal edification during a rest period, or corporate

    mining of the behavior of individuals in a store so they may

    have items marketed to them the next time they enter that

    store, etc. As cameras become ubiquitous in public loca-

    tions, camera networks become smarter and storage

    capacity increases such that larger volumes of video data

    may be retained for lengthier periods of time, it is anincreasing concern that video data collected for one pur-

    pose may be used for other purposes. If an intrusive usage

    was known to the target individual, they might have chosen

    to not participate, by shopping in a competing mall. In this

    article, we introduce privacy filters as the framework which

    could anonymize the people who are observed by net-

    worked video cameras in accordance with a privacy policy.

    Depending upon the specifics of the privacy policies being

    enforced, global trends in the videos could still be visible,

    such as people going into and out of a room, while the

    appearances of the individuals could be redacted, thus

    minimizing the potential for misuse of the video andunintended consequences for the people being observed.

    We endeavor to protect the identity of innocent indi-

    viduals. However, some users need the option of investi-

    gating and identifying individuals (if they have the

    authority to do so). In this scenario, the LVDBMS is

    designed to allow someone with the proper access to

    investigate the identity of individuals via unperturbed

    video streams, while applying privacy filters to protect

    individual privacy by blocking video stream consumers

    without sufficient access privileges. Thus, privacy

    enforcement does not affect the intended utility or intelli-

    gibility of the video. The challenge is to accomplish this in

    real time without restricting who consumes the stream so

    that actions triggered by events occurring in the stream are

    timely and relevant.

    3.1 Scope and assumptions

    This section describes the objectives of the transformation

    that privacy filters induce upon corresponding video

    streams. It also defines the scope of implementation

    assumptions inherent to our LVDBMS prototype.

    3.1.1 Scope of privacy applied to video stream output

    The proposed privacy framework is based on the concept

    of privacy filters. A privacy filter implements a privacy

    policy, which specifies under what circumstances the

    appearance of objects appearing in video streams passing

    through the filter may be observed or must be redacted

    from the stream. The primary goal of privacy filters is to

    obfuscate the appearance of qualifying objects such that

    objects will become unidentifiable after passing through

    the privacy filter. More precisely, a privacy filter defines a

    set of criteria. This criteria is matched with objects that are

    identified in video streams. This criteria can be specified to

    be very precise (e.g., objects that satisfy a particular query

    condition based upon temporal and spatial location), or

    general (e.g., applied to all objects observed by a particular

    camera). Thus, the scope of privacy filters are the salient

    objects appearing in the video stream; not the environment,such as the scene background observed by a camera, or

    other conditions that can be observed such as time of day,

    conjectures based upon knowledge of the location of the

    camera from which the video stream originates, etc. Fur-

    thermore, privacy filters are applied to video streams as a

    final step before the streamed data is externalized from the

    system. In order to maximize query accuracy, queries and

    internal indices for object tracking are based upon metrics

    calculated from non-obfuscated data. This raw metadata is

    never externalized by the LVDBMS or explicitly saved to

    persistent storage.

    3.1.2 Scope of system prototype implementation

    The focus of this research is privacy policies, the realiza-

    tion of those privacy policies as privacy filters, and the

    corresponding transformations privacy filters have upon

    corresponding video streams. We also do not consider as a

    part of this work aspects of system security that must be

    considered in an actual physical deployment to a public

    area. For example, we do not consider the physical aspects

    of the system, such as the physical security of servers

    hosting LVDBMS software, cameras and the communica-

    tions channels between cameras and LVDBMS hosts.

    However, we note that such things can be accomplished by

    other means such as purchasing fixtures to hold the cam-

    eras and enabling encrypting communication tunnels via

    the operating system or through virtual private networks,

    etc. Furthermore, we do not attempt to detect and thwart

    privacy attacks against the system, such as through a series

    of specifically crafted queries issued by a user designed to

    leak unintended information. Although we do implement

    certain safeguards, such as providing a mechanism to

    restrict which cameras and video streams a user can

    observe, we assume a user is who she presents herself to

    be, and not a malicious user masquerading as a legitimate

    system user.

    3.2 Framework overview

    Privacy filters may be applied at different levels in the

    LVDMBS system hierarchy, and video streams may be

    affected by multiple privacy filters. This cascade of pri-

    vacy filters is conceptually similar to how views may

    restrict columns in a traditional relational database (Fig. 6

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    7/21

    left). When a video stream passes through multiple pri-

    vacy filters, the effect is that the most stringent privacy

    level is applied (note that a privacy filter does not nec-

    essarily apply to each object appearing in a particular

    frame of video in a passing stream; Fig. 6 right). Simi-

    larly, in a relational database a user may be allowed to

    only access views, and relational views may be built upon

    other views, which may themselves reference the physicaltables or yet other views.

    In the LVDBMS hierarchy, privacy filters are associated

    with cameras, queries, user groups, and view objects:

    Camera: Any camera with the system can have a

    privacy filter associated with it. Applied at this level,

    the privacy filter has the broadest impact as it affects all

    consumers of this camera.

    Query: A privacy filter at this level has a moderate

    impact, as it is associated with a specific query. It

    affects only the consumers of the querys output.

    User Group: A privacy filter at this level has thenarrowest impact. Only the users in this group are

    affected.

    View: A view, implementing a privacy filter, may be

    defined over a stream or a previously-defined view.

    Queries and users may access the underlying video

    stream through the view with the constraint that the

    privacy filter will be applied to the views output.

    3.3 Filter output sanitation model

    While the previous section provided a conceptual definition

    of privacy filters, for clarity this section gives a more

    precise treatment. Let Q be the set of active queries posed

    over streams S in the LVDBMS. A stream S 2 S in the

    stream processing layer is a first in first out (FIFO)

    sequence of frames S ffi;fi1; . . .;fik1g where k is thelesser of the maximum number of frames required to

    resolve any active query q 2 Q and a system definedmaximum. (Note that frame fi 2 S for any S represents themost recent image captured by a camera, after a negligible

    processing and communication delay, and S is maintained

    in real time as frames are received from spatial processinglayer hosts.)

    Frames in S are retrieved via a frame access function:

    AccS; k ! frame

    which retrieves the frame and corresponding metadata in

    the kth position in S. When a stream is externalized fromthe LVDBMS (such as for display on a users terminal,

    saved in a file, etc.), it is passed through a sanitizer

    function:

    San S;f ! Acc S; 1 Zf

    where Z returns a mask that indicates which regions of theframe to obscure in accordance with the privacy filter fand

    obfuscates the image bitmap contained in the frame withthe mask, perturbing the output of San. When selecting

    from a view with a privacy filter f0 San becomes:

    San S;f ! Acc S; 1 Zf f0

    where * combines the filters as described in the previous

    section. In literature, some sanitizers chose to not answer

    queries or add noise using a statistical distribution. How-

    ever, Z is deterministic with its parameter f. Furthermore,

    detecting an attack (that is, a determination of information

    the system attempts to redact with the mask) is beyond the

    scope of this study, and all queries are assumed to be

    legitimate.

    Fig. 6 Relational database

    view (left) compared with

    illustration of a cascade of

    privacy filters (right)

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    8/21

    3.4 PSL

    The PSL allows a system administrator to implement pri-

    vacy policies by constructing privacy filters, and to manage

    system user access. Privacy filters can be associated withgroups, and thus with individual users through their user-

    group membership association. When applied to a group, a

    privacy filter affects all people in the group. Privacy filters

    may also be associated with views, in which case they

    apply to any accessors of the view. (Privacy filters may also

    be associated with cameras, but this is specified in the

    configuration file associated with the cameras adapter.) All

    privacy filters are cumulative; the system does not provide

    a way to reduce privacy by adding a new privacy filter.

    When a user creates a query, their privacy filters (via

    group memberships) are in turn associated with the query.

    However, a system administrator to create canned que-ries which users may run unaffected by the privacy filters

    associated with the executors account. For example, such

    a query could save an unperturbed video stream to a per-

    sistent storage location the user does not have access to, if

    they feel a crime is occurring and observed in a camera that

    they do not have unrestricted access to. The syntax for the

    PSL is provided in Fig. 7.

    3.5 Design and implementation of privacy filters

    This section presents operational and implementation

    details of privacy filters in the LVDBMS. When two ormore privacy filters apply to a stream, they are combined

    into an effective privacy filter.

    An active query is a currently running query in the

    LVDBMS system. Each query has operators that specify

    object(s) from cameras (i.e., spatial operators) as input. A

    relevant object is an object that appears in a video stream

    referenced in an active query, and can potentially con-

    tribute to the query evaluating to true. If the query becomes

    true, the contributing relevant object is called a target

    object; otherwise, it is a non-target object. Consider the

    query Contains (C1.S1,C1.*) illustrated in Fig. 8. This

    query determines if there is any dynamic object detected

    within the static object S1 represented by the dashed rect-

    angle. Since the dynamic object D121 is contained within

    S1, D121 is a target object. On the other hand, another

    dynamic object D102 is not contained in S1, D102 is a non-

    target object. A privacy filter can be specified so that only

    target objects, non-target objects, or all relevant objects

    should be protected (i.e., have their appearance obscured).If a protected object no longer satisfies a privacy filter

    specification, this object obtains the status previously

    masked. In this example, if the privacy filter is to blur all

    the target objects, then the dynamic object D121 is a pre-

    viously masked object after it leaves the boundary of S1.

    We note that displaying a live video stream from a given

    camera is a similar to a query which always evaluates to

    true, and the output video stream is the same as the input

    video stream.

    {CREATE | UPDATE | DELETE} FILTERfilter_identifier

    [TARGET = {QUERYTARGETS | NONQUERYTARGETS | PREVIOUSLYMASKED}]

    [TEMPORALSCOPE = {QUERYNONACTIVE | QUERYACTIVE | PERMANENT}]

    [OBJECTSCOPE = {STATIC | DYNAMIC | CROSSCAMERADYNAMIC}]

    {CREATE | UPDATE | DELETE} VIEW view_identifierOVER stream_identifier

    [WITHfilter_identifier]

    {ASSOCIATE | DISASSOCIATE} GROUP group_identifierWITH

    {FILTER | VIEW}filter_identifier

    {CREATE | DELETE} USERGROUP group_identifier

    {ASSOCIATE | DISASSOCIATE} USER user_identifierWITH group_identifier

    Fig. 7 Privacy Specification

    Language; uppercase represents

    a keyword and italics a user-

    supplied parameter

    Fig. 8 QueryTarget versus NonQueryTargets: D121 satisfies the

    query condition and is a target

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    9/21

    3.6 Defining privacy filters

    Privacy filters are created in the LVDBMS in one of two

    ways: they are specified in configuration files at system

    startup, or are created with the create filter PSL statement.

    A privacy filter is specified by the attribute 3-tuple: {Tar-

    get, TemporalScope, ObjectScope} where Target, Tempo-

    ralScope, and ObjectScope are quantified in Tables 1, 2,and 3. Not all attributes are applicable to every use of

    privacy filters, and may be set to None. The Target

    parameter specifies if the filter applies to only target

    objects, non-target objects, all relevant objects defined over

    the camera (static objects defined by users and dynamic

    objects detected in the video stream), or None indicating

    that this attribute should not be considered when deter-

    mining if a privacy filter applies to a particular object.

    ObjectScope refers to the type (scope) of objects the filter

    applies to; static or dynamic objects, cross-camera

    dynamic, none or all objects appearing in the stream.

    TemporalScope indicates when the filter will be applied;always, never (the filter is currently inactive), or when the

    stream is or is not being accessed by a query.

    3.7 Privacy filters applied to cameras

    Logically, we may associate privacy filters with physical

    cameras, but to make the LVDBMS software more flexible

    with respect to the types cameras that can be used with the

    system, privacy filters are actually evaluated by the camera

    adapter.

    When the camera adapter is initialized, it takes its initial

    configuration from a configuration file. The initial state of

    its privacy filter can be specified in this file and will persist

    for the lifetime of the adapters state. (When the LVDBMS

    is in operation, an operator may specify new default pri-vacy filter settings.) A cameras privacy filter is maintained

    in the camera adapter.

    Example 1 if a camera has a privacy filter with an

    attribute set to None, then by itself, that filter will have no

    effect. However, when combined with the privacy filter of

    an active query, it can combine to elevate the privacy state.

    For example, given a camera with a default privacy filter of

    {All, None, None} will not result in any effective privacy

    state when a query is evaluated with its images. However,

    given a query with privacy filter {QueryTargets, Query-

    Active, Dynamic} will evaluate to the effective privacy

    state {All, QueryActive, Dynamic}. The difference is that

    all dynamic objects will be obscured, instead of only ones

    that are query targets.

    The attribute values of a privacy filter apply at the

    camera level as follows. The Target parameter specifies

    whether it applies to objects that are the target of active

    queries (QueryTargets), objects that are not targets of

    active queries (NonQueryTargets), all objects defined over

    the camera (static, dynamic and cross-camera dynamic)

    (All), or no objects (None). PreviouslyMasked refers to

    objects that previously qualified to be included in a Privacy

    Filter. (i.e., they were a non-query target in the camera with

    a NonQueryTargets attribute set, were a query target with

    the QueryTargets attribute, etc.) We note that from the

    perspective of a camera, an active query is a query that has

    (1) been issued to the LVDBMS system; (2) is not expired;

    (3) has not evaluated to a condition that executes an action

    that causes the query to terminate; and (4) the query has an

    operator that specifies as input object(s) from said camera.

    A query target is an object that appears in the field of

    view of a camera, and satisfies two conditions: (1) it is a

    static object, dynamic object, or cross-camera dynamic

    Table 1 Privacy filter values for parameter type Target

    Attribute Description Priority

    None No privacy 1

    QueryTargets Targets of active queries are obscured 2

    NonQueryTargets Objects that are not targets of active

    queries are masked. An active query

    may obscure their identity

    2

    PreviouslyMasked Specifies that objects that were

    previously masked will continue to

    be masked

    2

    All All object identities are masked,

    regardless of query status

    3

    Table 2 Privacy filter values for parameter type TemporalScope

    Attribute Description Priority

    None No privacy 1

    QueryNonActive Privacy settings apply only when a

    query is not active

    2

    QueryActive Privacy settings apply only when a

    privacy-enabled query is active (in the

    case of privacy applied to a camera,

    for example)

    2

    Permanent Privacy settings apply for the lifetime of

    the object or camera or query

    3

    Table 3 Privacy filter values for parameter type ObjectScope

    Attribute Description Priority

    None No privacy (no relevant objects

    qualify)

    1

    CrossCameraDynamic Objects that are first detected in

    another camera

    2

    Dynamic Dynamic (automatically detected)objects 2

    Static Static (user defined) objects

    All All classes of objects qualify 3

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    10/21

    object associated with said camera, and (2) it is referenced

    (as an operand; directly in the case of a static object or

    indirectly in the other two cases) by an operand in an active

    query over the camera in which it appears. In Fig. 8,

    Dynamic object D121 is contained within an active query

    and is a QueryTarget. D102 is not associated with a query

    and has the status NonQueryTargets.

    The ObjectScope Privacy Filter attributes are the dif-ferent classes of objects identified by the LVDBMS and

    explained previously. The TemporalScope attribute of a

    Privacy Filter applies at the camera level as either None or

    Permanent; we currently do not support a more granular

    temporal operator.

    3.8 Privacy filters applied to queries

    Privacy filters may also be associated with queries. Default

    privacy filters may be configured at the system level (in the

    configuration corresponding to the Stream Processing

    Layer host), or it may be specified when the query isinstantiated. Once a query is associated with a privacy

    filter, that privacy filter is retained with the query for the

    lifetime of that query. A querys privacy filter is kept in the

    stream processing layer Query Executive along with other

    query metadata, such as the number of sub-queries a query

    has been decomposed into, which stream processing layer

    hosts the query has been sent to, etc.

    When applied to a query, the TemporalScope attributes

    (Table 2) have only two distinct behaviors; None and

    Permanent (QueryActive is treated equivalent to Perma-

    nent, since both refer to the lifespan of the query).

    Example 2 a Department of Transportation (DOT)

    Traffic Management Center (TMC) makes available live

    video feeds from cameras mounted along major roadways

    for broadcast on nightly news television news segments.

    The TMC provides these video feeds to allow the public to

    observe traffic pattern trends in real time (such as how

    quickly traffic is moving across a particular bridge), and for

    news broadcasters to announce traffic incidents causing

    lane blockages. However, TMC personnel do not want to

    embarrass individuals involved in specific traffic incidents,

    or to broadcast identifiers such as license plate numbers.

    The TMC objective is to provide real-time video with ablur applied to all objects. Object types (e.g., car, truck, and

    pedestrian) and color can be distinguished by observers,

    but not individual identifiers such as faces and license plate

    numbers (Fig. 9). (Note that only TMC personnel can

    control camera functions such as zoom.) This is accom-

    plished by creating a query with an Appear() operator with

    corresponding privacy filter {None, None, All}. A privacy

    filter must be associated with the query because by default

    the query would run with the privacy filter of the user who

    created the query. In the case of a TMC operator who does

    not have her view restricted, the default privacy filter for

    query from a TMC operator would not have a privacy filterapplied to it.

    3.9 Privacy filters applied to users

    Default privacy filters may also be defined that apply to

    users of the system who connect through the client GUI

    and can browse video cameras. When a users GUI con-

    nects to the Stream Processing Layer host, the connection

    is registered with the Session Manager which records

    connectivity information such as client IP and port, starts a

    heartbeat service, and associates a privacy filter with the

    registration if necessary. The heartbeat service keeps track

    of clients who are connected to the system and deallocates

    resources for clients who disconnect. Clients can run que-

    ries which do not have actions specified, but continuously

    return evaluation results to the GUI for the user to watch.

    Such queries will be aborted if the corresponding client

    remains disconnected for a period of time. When applied to

    a user, the TemporalScope attributes (Table 2) have only

    two effective behaviors; None and Permanent (where

    Permanent refers to the time the client is connected to the

    LVDBMS). The other attributes behave as described

    previously.

    Example 3 a security monitoring application is written

    for the LVDBMS. It has a predefined set of queries the

    security guards, can select from, and view on their GUI.

    They can also monitor any camera associated with the

    system, but only see the identities of people who satisfy the

    query conditions (for example, someone who has been

    standing in the same place for more than 5 min). If a

    security guard watches that video feed, the person who has

    not moved will be unobstructed, but people walking nearby

    will have their image masked. After 5 min, a query actionFig. 9 The two dynamic objects depicted here have their details

    obscured by the privacy filter

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    11/21

    is triggered that records the video, this query is not asso-

    ciated with the security guards privacy filter and records

    the entire camera view without obstructions.

    3.10 Combining privacy filters

    Privacy filters associated with cameras, users, and queries

    must all be combined to determine which objects in whichframes of the video will have their identity obscured when

    it is viewed by a user or saved to a persistent file. When a

    user requests to view video from a camera, the users pri-

    vacy filter is sent to the corresponding camera adapter. It is

    combined with the cameras and views privacy filters (if

    applicable) and then the users GUI connects directly to the

    camera adapter to receive live video. In the case of a query,

    the cameras privacy filter is sent as metadata along with

    the image descriptor, to the corresponding spatial pro-

    cessing layer host. If a query action requires a live video

    stream (e.g., to save it to disk or direct it to a video

    monitor), then the querys privacy filter will be pusheddown to the spatial processing layer host which is con-

    nected to the camera adapter.

    When two objects that have active privacy filters inter-

    act, the effective privacy settings must be calculated. As we

    have discussed, when privacy filters interact at multiple

    levels, the effective privacy filter must be calculated. Each

    attribute in the privacy filter 3-tuple has a value which is

    assigned a priority (or no value is specified in which case

    that attribute is not factored into the privacy calculation).

    When combining attributes, the highest priority attribute is

    taken, where a higher priority corresponds to more object

    observations being redacted from the output video stream.

    Attribute priorities are specified in the Priority column in

    Tables 1, 2, and 3. When combining privacy filters, if they

    have different attribute values with the same priority for the

    same attribute, then the effective attribute value chosen is

    the value at the next higher priority for that attribute. This

    procedure must be repeated each time a new query becomes

    associated with a camera object, or a query expires.

    Example 4 given a camera object with privacy filter

    {NonQueryTargets, QueryActive, CrossCameraDynamic}

    and a query object associated with the camera {Query-

    Targets, QueryActive, Dynamic}, the effective privacy

    filter will be {All, Permanent, All}. That is, the priorities

    are {2,2,2} and {2,2,2}. Where the tuples have different

    attribute values that are of equal priority, this is reconciled

    by giving the attribute the value of the next higher priority

    parameter.

    3.11 Tracking based upon a multifaceted object model

    In order to implement the cross-camera dynamic object

    operand in LVSQL, we developed a camera-to-camera

    tracking technique based upon constructing an appearance

    model of the objects appearing in video streams. Objects

    are tracked from frame-to-frame using a traditional track-

    ing technique (e.g., [32]), which we refer to as a frame-to-

    frame tracker since it tracks objects within a single video

    stream. When an object appears in a consecutive sequence

    of frames, the frame-to-frame tracker assigns a unique

    identifier to the object as a part of the tracking process. Afeature vector based upon the objects appearance is also

    calculated. An object is represented as a bag of multiple

    instances [6, 7], where each instance is a feature vector

    based upon an objects visual appearance at a point in the

    video stream. Therefore, an object can be viewed as a set of

    points in the multidimensional feature space, referred to as

    a point set (Fig. 10). Note that the k instances in a bag are

    derived from k samplings of the object, which may not

    necessarily be taken from consecutive frames.

    A FIFO database to hold the multiple-instance bags of

    objects recently detected by the different cameras in the

    system. As new observation becomes available, the bagitself is updated by adding the new instance and removing

    the oldest instance. As surveillance queries generally

    concern real-time events that have occurred recently, the

    FIFO database is typically very small, and in our prototype

    we implemented it as a distributed in-memory database

    system (distributed among spatial processing layer hosts).

    Cross-camera tracking is performed as follows. When an

    object is detected by a camera, its multiple-instance bag is

    extracted from consecutive frames in the video stream and

    used as an example to retrieve a sufficiently similar bag in

    the distributed object-tracking database. If there exists

    another bag sufficiently close, based upon the squared

    distance metric, then the two bags are considered to cor-

    respond to the same object. On the other hand, if the system

    does not find a sufficiently similar bag, the occurrence of

    Fig. 10 Multifaceted object representation model in which an object

    is represented by its point set (i.e., feature vectors)

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    12/21

    this newly detected object is considered as the objects first

    appearance in the system.

    To support the retrieval operations, the distributed in-

    memory database needs to compute similarity between

    bags. Given two bags of multiple instances:

    X x!1; x!

    2; . . .; x!

    k

    and X0 x!

    01; x

    !02; . . .; x

    !0k

    n o;

    where k is the cardinality of the bags, we compute their

    similarity as follows:

    dm X; X0 minsi; siXmi1

    x!si x!0

    si

    2

    ;

    where m B k is a tuning factor and x!si x!0

    si0

    2

    x!si x!0

    si0

    2

    is the squared distance between the two

    vectors. This distance function computes the smallest sum

    of pairwise distance between the two bags. Although we

    can set m = k, a smaller k value is more suitable for real-

    time computation of the distance function. For instance, ifm = 1, two objects are considered the same if their

    appearances look similar according to some single obser-

    vation. We set m = 5 in our study. Traditionally, each

    object is represented as a feature vector, i.e., a single point,

    instead of a point set, in the multidimensional feature

    space. This representation is less effective for object rec-

    ognition. For claritys sake, let us consider a simple case in

    which two different persons currently appear in the sur-

    veillance system. One person wears a 2-color t-shirt with

    white in the front and blue in the back. Another person

    wears a totally white t-shirt. If the feature vectors extracted

    from these two persons are based on their front view, the

    two would be incorrectly recognized as the same object. In

    contrast, the proposed multifaceted model also takes into

    account the back of the t-shirt and will be able to tell them

    apart. The bag model is more indicative of the objects.

    The FIFO database is implemented as a series of queues

    and hash tables residing in spatial processing layer hosts.

    Each host maintains indices (hash tables) of the bags of

    objects observed in video streams from corresponding

    camera adapters. Indices associate objects with specific

    video frames in which they were observed, video frames

    with video streams, objects to bags, objects with queries

    over the objects corresponding video streams, etc. Objects

    appearing in two separate video streams will have two

    separate bags in the index and two separate identifiers

    (camera adapter identifier, local object tracking number,

    and host identifiers concatenated into a string). If the two

    objects are determined to be the same object, their bags are

    merged and the index updated such that both object iden-

    tifiers point to the same (merged) bag of observations.

    Example 5 Cross-camera tracking allows queries to be

    issued that consider multiple cameras to determine if an

    event has occurred. For example, consider employees who

    work in a building that does not permit smoking inside, but

    has a back door and a bench next to the door for smokers to

    sit. When someone comes out of the building and sits at the

    bench, we assume it is an employee. When someone comes

    down a nearby street and waits by the door, their motives

    are unknown to us and a security guard should be notified

    to observe the situation. In the LVDBMS, this can beaccomplished by creating a query over both cameras with a

    cross-camera dynamic operand, to detect when someone

    appears in the street camera (which does not observe the

    bench or door) and then appears in a camera observing the

    back door and bench. Thus, query targets are objects

    appearing in first the one camera and then the second. An

    associated privacy filter would obscure non-query targets

    (object appearing in only one camera, or appearing in the

    back door camera and later in the street view camera). The

    privacy filter would be {NonQueryTargets, None, None}

    and the query is Before(Appear(V1.#), Appear(V2.#))

    where V1 shows the street and V2 the back door.

    4 Evaluation

    This section describes the experimental conditions in

    which the LVDBMS software was evaluated and the

    experimental results. The focus of the LVDBMS is in real-

    time environment, as opposed to a system that operates

    only with pre-recorded video. Thus, periodic activities such

    as the amount of time required to evaluate a query must be

    less than the frequency in which queries are evaluated.

    Otherwise query evaluation queue will grow unboundedly

    and query results will not be returned in a timely fashion.

    In addition, because video streams entering the system

    from cameras are unbounded (the camera can always be

    turned on and transmitting video), only a small amount of

    data can be retained within a sliding window before it must

    be discarded to provide room for new data. Thus, imple-

    menting privacy protection in real time is different and

    challenging from doing so off-line because (1) the time

    required to carry out data processing operations is bounded.

    And (2) due to storage limitations, only a small portion of

    the video data may be retained at any particular time in its

    raw (unsummarized) format and must be processed in one

    pass through reading through the data. In off-line pro-

    cessing, the video data is stored and can be processed with

    multiple passes over the data, for example, to create an

    index structure to be used in a later processing stage.

    4.1 Experimental setup

    To test the effectiveness of the LVDBMS, we utilize three

    sets of videos, where each video set satisfies a different

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    13/21

    objective. We number the data sets from 1 to 3 as follows:

    (1) We created a series of reference videos by placing

    cameras in three locations in a campus building, inside two

    laboratories (rooms) and in a hallway. Each laboratory has

    slightly different lighting with no external windows, and

    the hallway has exterior windows along one wall. This

    provides reference videos with changing lighting condi-

    tions, and the subjects at times are obscured by desks,chairs, and tables. This creates a challenging environment

    to track objects from one camera to another. This series of

    videos involved 5 people, with on average 2 or 3 people

    appearing in the field of view at any particular time. (2)

    Videos from the CAVIAR project (http://homepages.inf.ed.

    ac.uk/rbf/CAVIAR/) are utilized. These are low-resolution

    videos that provide video coverage of the same scene from

    two different views, font and side. This is challenging

    because the video resolution is small by todays standards,

    and the objects appearing in the videos have relatively few

    pixels to contribute toward building the appearance model

    (bag of feature vectors). (3) We created a series of videosrecording traffic on roads (cars, trucks, and a few pedes-

    trians were observed). Automobiles are rigid objects that

    do not change shape while we observed them driving and

    move in patterns (constrained by the road). This scenario

    provides an excellent testbed to test spatial operators, such

    as Appears, North, West, etc. with approximately perfect

    tracking accuracy within a video stream.

    We evaluated the LVDBMS with pre-recorded videos so

    that the same conditions could be simulated with different

    configuration parameters. In our LVDBMS implementa-

    tion, image frames are presented to the camera adapter in

    one of two ways: an initial processing thread either reads

    the image data from a memory buffer that is written to by a

    device driver for the camera hardware, or the image data

    are extracted from a video file by a video codec. Once the

    frame is extracted, it is enqueued in a new frame queue. A

    second processing thread retrieves frames from the new

    frame queue and proceeds to identify foreground pixels

    from background pixels, etc. Once a frame of image data

    has been enqueued on the new frame buffer, throughout its

    lifetime it is indistinguishable whether or not the frame was

    extracted from a live camera or from a pre-recorded video

    file. Therefore, once the frame has been received from its

    source, the frames original video source is indistinguish-

    able to the rest of the system processing pipeline and has no

    effect on query processing or other system behavior.

    For these experiments, the frame-to-frame tracker is

    configured to ignore detected objects less than 200 pixels in

    area, which we will consider noise (but this parameter is

    configurable at the camera adapter level). Software in all

    tiers takes configuration settings from XML files, and to

    facilitate scripting, also accepts command line arguments.

    LVDBMS core components are implemented in C# and

    utilize Language Integrated Queries (LINQ) to maintain

    some internal queues and hash indexes. For the experi-

    mental results presented in this article, the LVDBMS ser-

    ver layers ran on a Windows 7 computer with 3 GHz

    Pentium IV with Hyper threading CPU and 3GB RAM

    (Dell Precision 370); and the camera adapters on a

    2.54 GHz Core 2 Duo Latitude E6500 with 4GB RAM. For

    the eight camera experiment discussed in Sect. 4.4.1, thecamera adapters were hosted on a Windows 7 HP Pavilion

    laptop with a 2.3 GHz Quad Core CPU and 4GB RAM. We

    use Emgu CV, a .NET wrapper for the Intel OpenCV

    library, which is utilized for low-level visual operators in

    conjunction with the Intel Integrated Performance Primi-

    tives (IPP) library.

    4.2 Effectiveness of privacy filters

    The purpose of privacy filters are to obscure qualified

    objects in videos from being identified. In this section, we

    provide an example of how an object that is obscured bya privacy filter is displayed to users via the LVDBMS

    client. Figure 11 illustrates two separate situations where

    privacy filters obscure the identification of detected

    objects. On the left-hand side, a person is walking toward

    a door, and on the right-hand side a vehicle is traveling

    down a street in a traffic-counting application. In these

    examples, objects are obscured by blurring pixels con-

    tained in the bounding boxes. Applying a blur maintains a

    visually appealing image in which obscured objects do

    not significantly stand out, but other options, such as an

    adaptive blur based upon the side of the bounding box, or

    simply setting the entire rectangle to a solid color such as

    black, are other options depending upon how much the

    appearance of the image should be obscured from the

    video stream (and is not a focus of this work). Additional

    privacy-preserving measures such as increasing the size of

    the box to mask the size and shape of the object being

    protected is another option with the tradeoff of decreasing

    the utility of the video (as more of the video is obscured

    to the viewer).

    Privacy filter effectiveness is a function of the effec-

    tiveness of the object detection logic, and depending upon

    the query, the tracking logic. In the left image in Fig. 11,

    the person walking satisfies query condition Appears(), as

    well as false positives (FPs) identified by the background

    segmentation algorithm due to a person walking through a

    door and the closing. In this case, the relevant objects

    identity is obscured, as well as four FP areas (which can be

    reduced by adjust camera parameters), and the privacy

    condition is satisfied. In this section, we do not provide a

    separate table of privacy evaluation results, because had

    the queries presented in Table 4 had privacy filters enabled,

    the privacy filter effectiveness would have exactly the same

    A general framework for managing and processing live video data

    123

    http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/http://homepages.inf.ed.ac.uk/rbf/CAVIAR/
  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    14/21

    result presented in the third column of Table 4, tracking

    accuracy.

    The accuracy of privacy filters correctly obscuring the

    appearance of an object from a video stream is related

    directly to the specification of the privacy filter, and if it is

    associated with a query, the query. For example, a privacy

    filter that obscures all objects in a video stream is depen-

    dent upon the object segmentation algorithm to correctly

    identify objects from the video background. Todays

    background segmentation algorithms are very accurate, and

    in the cases where they error (such as complex moving

    backgrounds), the errors would be incorrectly consideredobjects and have their appearances obscured. Likewise,

    tracking algorithms which track objects viewable in con-

    secutive frames of video in a single camera are also

    accurate, but less accurate than the simple foreground/

    background extraction. Similarly, tracking objects from

    one camera to another is a more difficult problem, and as

    indicated in Table 4, yet less accurate. If two objects

    visually appear the same in two cameras, it is difficult

    problem to determine if they are in fact the same object,

    without an additional aid such as a security ID card with an

    RFID. In order to maximize the privacy in these later sit-

    uations, one could construct a privacy filter that obscures

    all visible objects (rather than filtering based upon query

    target or non-query target), thus minimizing the effect of

    query accuracy. If an object has an appearance that is

    sufficiently similar to the background, then it would not be

    detected by the background segmentation algorithm, andwould not have a privacy filter applied to it. As soon as it

    were to move or its appearance change such that it looked

    sufficiently distinct from the background around it, it

    would be recognized as a salient object by the LVDBMS

    and any applicable privacy filters would apply to the

    object.

    4.3 Privacy filter demonstration

    This section provides several scenarios showing the

    application of privacy filters. The first demonstrates a

    transportation application in which a Transportation Man-agement Center (TMC) operators terminal and a live news

    feed originate from the same traffic camera. The query,

    Appears(v1.*, 250), which monitors for objects in the video

    stream sized 250 pixels or greater, is active on the camera,

    v1. The live news feed is served through a view which has

    associated with it a privacy filter that specifies that query

    targets should be obscured, illustrated in Fig. 12.

    Figure 13 shows a screenshot of the traffic camera video

    feed as the TMC operator would observe it. Through the

    LVDBMS they are viewing images from the camera and

    are not associated with any privacy filters. The live video

    provided for television, however, has access only to obtainvideo from the view, view1. This view has a privacy filter

    associated with it, which applies to all objects that are

    query targets, that is, which might contribute to a query

    evaluating to true. This privacy filter has an effect only

    when a query is active (in this example the query monitors

    for objects appearing in the video stream which are larger

    than a specified area in pixels). Figure 14 provides an

    example of video observed through view1 with the query

    active.

    Fig. 11 Examples of objects

    identities obscured by privacy

    filters. The left image is from

    data set (1) and right (3)

    Table 4 Continuous query evaluation results

    Query name Description Accuracy

    Appear True if objects with area greater

    than 100 pixels appears in the

    frame, else false

    100%

    North before

    south

    True if there exists an object is

    moving with downward motion.

    Before operator has a windowsize of 20 frames; if the object

    stops or changes direction for less

    than 20 frames it is still

    considered true

    100%

    Appear across

    cameras

    A person appears in camera 1 and

    then is recognized when they

    appear in a second camera

    83%

    (TP = 20,

    FN = 4)

    Appear, then

    cover across

    cameras

    An object appears in camera 1 then

    goes through a door (outlined by

    a static object) in the second

    camera

    91%

    (TP = 22,

    FN = 2)

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    15/21

    The next example (Fig. 15) demonstrates privacy filters

    at multiple levels of the LVDBM hierarchy. A name plaque

    is mounted in a corridor and must be obscured in video

    streams sent to all consumers of this video source. To

    accomplish this, a static object is defined over the plaque and

    a privacy filter is associated with the camera (Fig. 16). This

    camera-level privacy filter will be propagated to all con-

    sumers of this video stream and factor in the effective pri-vacy filter computation for each consumers video stream.

    The camera is accessed by two users, User 1 and User 2.

    User 1 is not explicitly assigned a privacy filter, and User 2

    has been assigned a privacy filter that is applicable to all

    objects of type dynamic. (For example, User 2 might only

    need to recognize general activity, and notify a supervisor,

    User 1, when a closer review is required.) When the video

    is viewed from User 1s terminal, the camera-level privacy

    filter is propagated and applied to the static object drawn

    around the plaque. The result is the plaque is obscured in

    the video output on User 1s terminal, Fig. 17. User 2 is

    explicitly assigned a privacy filter that applies to alldynamic objects in the video stream. The privacy filter on

    dynamic objects is propagated to User 2 by the LVDBMS

    automatically. When the frames are rendered into the video

    stream for User 2, the privacy filters are combined (per the

    discussion Sect. 3.10) and the effective privacy filter

    applies to all objects identified in the video stream, as

    illustrated in Fig. 18. Both terminal images in Fig. 18 are

    from the same video stream, but illustrate two different

    system configuration settings, depending upon how much

    detail might want to be revealed about obscured objects.

    The upper image has a blur operator applied, which would

    blur identifying features while providing the operatorsubstantial visual information to observe behaviors. The

    lower terminal image simply applies a bounding box that is

    the average pixel color for the region to obscure.

    4.4 Query evaluation accuracy for event monitoring

    tasks

    Two important aims of the LVDBMS are overall usability

    and the ability to specify privacy policies in terms of

    Relevant Object Privacy Filter

    Camera None

    User 1: TMC Operator None

    User 2: News station

    live video feed

    None

    View1 Target=QueryTargets

    Fig. 12 Transportation

    example showing a video

    source, v1, providing live video

    which is consumed by a TMC

    operator and live television. The

    table (right) shows privacy

    filters associated with various

    objects in this scenario

    Fig. 13 Unmodified video originating from camera, v1, and as

    viewed by the TMC operator who does not have a privacy filter

    Fig. 14 Live video as viewed through the view view1 with privacy

    filter and active query

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    16/21

    privacy filters. To be relevant to surveillance applications,

    the ability to define queries which detect noteworthy events

    is important. In addition, privacy, and the ability to main-

    tain some level of privacy for objects (i.e., people, identi-

    fiable automobiles, etc.) as surveillance systems become

    more automated and pervasive, is also important. Thus, in

    this study, we redesigned the LVSQL query language to be

    more concise and easier to use. Query accuracy is the

    accuracy of detecting user-posed events of interest by

    the LVDBMS. In the experiments in this section, we test the

    LVDBMS to correctly interpret and evaluate four continu-

    ous queries their results, i.e., if the conditions in the video are

    true when the query indicates a true condition. Query results

    are tabulated by manually monitoring the videos taken from

    dataset (1) and query result in the LVDBMS GUI and

    indicating the result every 5 s (by incrementing TP, TN, FP,

    FN). Each query is evaluated over a 2-min period.

    Relevant Object Privacy Filter

    Camera ObjectScope=Static

    User 1 None

    User 2 ObjectScope=Dynamic

    Fig. 15 Surveillance scenario

    illustrating the application of

    multiple layers of privacy filters

    to different types of objects

    Fig. 16 Unobscured view from camera indicating mounted name

    plaque

    Fig. 17 Video stream, as observed by User 1, with static object

    obscured with a solid pattern. The manor in which an object is

    removed from the video stream (solid box or blur) is configurable

    Fig. 18 Video stream, as observed by User 2, with blur (upper) and

    solid (below) patterns obstructing objects from the video stream

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    17/21

    As expected, the accuracy for queries involving a single

    video stream is extremely high. The accuracy of queries

    that correlate objects across multiple camera views is

    related to the accuracy of the underlying cross-camera

    tracking infrastructure, as reflected in the two cross-cameraquery experiments. The short (2 min) experiments allow

    only a few instances of each object to be observed and

    reflected in the index, however, with only a few bags

    representing objects in the index, there were no FPs or false

    negatives (FNs) that could be attributed to mis-associating

    an object in one video with the wrong object in the other

    video. To determine query evaluation performance, we

    constructed four queries, two single-camera queries, and

    two multi-camera queries, and present the results in

    Table 4.

    4.4.1 Query processing performance

    The resolution of a continuous query is the frequency in

    which it is evaluated. In order to be usable in a real-time

    system, query processing must be completed within a

    bounded amount of time for query evaluation to not

    become backlogged (and thus out of sync with the video

    image a user might observe) with respect to frames from

    streaming video, index updates, etc. We evaluate the per-

    formance of the query processing engine by simultaneously

    evaluating five queries for a period of 120 s over a random

    selection of ten videos. Figure 19 provides the results of

    evaluating the five continuous queries over each video withresults combined into a single plot. Table 5 presents

    summary metrics for the data plotted in Fig. 19 normalized

    for each query (divided by five). The dataset the video

    came from is indicated after the videos name in the table.

    Note that the cost to evaluate a query is a function of the

    input to its respective operators; some operators, such as

    AND and OR implement short-circuit evaluation and only

    evaluate the second argument if the value of the first is

    insufficient to determine the operator result. The data

    reported in Table 5 is based upon five queries and one

    video. We repeated this experiment with eight simulta-

    neous videos, and the performance results were relatively

    unchanged from those in Table 5. For the resolution of

    video utilized for this experiment, 8 is the maximum

    number of camera adapter instances that could be ran on a

    4-core host without the frame processing rate dropping to

    an unacceptable level (we consider approximately five

    frames per second manageable, but lower processing frame

    rates could lead to object segmentation and tracking errors,

    for example). Once the video stream has been processed by

    the camera adapter, the corresponding spatial processinglayer host receives a stream of object size and position

    updates, and video frames. The frame-to-frame tracking

    and background segmentation processes which occur in the

    camera adapter processing pipeline is the most CPU

    intensive stage in the data flow in the LVDBMS system.

    Compared to the video data received by the camera

    adapters, the quantity of data that flows to the spatial

    processing layer, and then to the stream processing layer, is

    substantially less at each phase. The spatial processing

    0

    20

    40

    60

    80

    100

    120

    140

    160

    180

    1 815

    22

    29

    36

    43

    50

    57

    64

    71

    78

    85

    92

    99

    106

    11

    3

    CPUTim

    einMilliseconds

    Elapsed Time in Seconds

    Query Evaluation Cost OneShopOneWait1front

    ShopAssistant2cor

    SR436_M2U00040

    TwoEnterShop1cor

    TwoEnterShop1front

    TwoEnterShop3cor

    TwoLeaveShop1cor

    TwoLeaveShop2cor

    Walk2

    WalkByShop1cor

    Fig. 19 Cost to evaluate five

    simultaneous queries in terms of

    CPU time

    Table 5 Average query evaluation cost in terms of CPU, per query in

    milliseconds

    Movie Performance

    Min Max SD Avg

    SR436_M2U00040 (3) 0.40 5.60 0.73 0.78

    OneShopOneWait1front (2) 0.40 30.81 6.26 4.58

    ShopAssistant2cor (2) 0.40 21.24 2.72 2.49

    TwoEnterShop1cor (2) 1.60 12.00 1.68 2.42

    TwoEnterShop1front (2) 3.40 26.60 3.05 5.97

    TwoEnterShop3cor (2) 0.40 7.00 0.90 0.87

    TwoLeaveShop1cor (2) 1.40 15.60 2.75 3.47

    TwoLeaveShop2cor (2) 0.40 14.00 1.86 1.38

    Walk2 (2) 0.40 1.80 0.40 0.72

    WalkByShop1cor (2) 0.40 3.00 0.49 0.73

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    18/21

    layer performs index updates and query evaluations, which

    is not CPU intensive, and in turn sends sub-query evalua-

    tions to the stream processing layer at the resolution of

    each query (e.g., once each second).

    Data presented in Fig. 19 and Table 5 show results from

    a mixture of queries which evaluated with within a period

    of time well below the query resolution of 1 s, for the

    evaluation period. Query evaluation entails computingoperator values which require operand lookups within

    index structures and finally updating metadata for objects

    to indicate query targets. What we want to emphasize with

    these results are that on average, over a wide variety of

    input videos, query execution on average is well below the

    1 s query resolution. Had query execution exceeded 1 s,

    query results would be out of sync with video frames

    presented to the user via the client.

    4.5 Multi-camera object tracking for privacy filter

    correctness

    This section provides camera-to-camera tracking results

    from our tracking technique based upon a multifaceted

    object model based upon an objects appearances in video

    streams. An essential feature of the privacy framework is

    the ability to construct a query and use it to either include

    or exclude dynamic objects from a privacy filter. Thus,

    object tracking and cross-camera object tracking (when a

    privacy filter or corresponding query is formulated to make

    use of such functionality) correlate positively with privacy

    filter accuracy.

    Figures 20 and 21 present accuracy results from two

    sequences of videos, from dataset (1), that involve

    tracking people across a three cameras setup in a labo-

    ratory environment as described in Sect. 4.1. In order to

    maximize the number of results to present, for this section

    we query the index for each observation of each object in

    each frame of video. That is, for each frame, we query the

    index for the 1st nearest neighbor from the query point

    (i.e., the objects feature vector) and return the result if it

    is sufficiently close else nothing is returned. If a result is

    returned, if the result is the correct object, true positive

    (TP) is incremented, else false positive (FP) is incre-

    mented. Likewise if no result is returned, true negative

    (TN) is incremented if the object is not in the index else

    we increment FN. Next, the bag corresponding to the

    object is updated to include the currently queried instance

    (based upon the cluster identifier assigned by the frame-

    to-frame tracker). This process is repeated for each frame

    in the video. We present accuracy, measured in terms of

    precision and recall:

    Accuracy TP

    FP FN TP:

    (The accuracy equation does not consider TN because if an

    event does not occur it will not be detected. Furthermore, if

    an event does not occur and we claim that it did occur, that

    is considered a FN which is a factor in the equation.) As we

    see from the accuracy indicated in Figs. 20 and 21, initially

    the feature space is sparse and the bag representations

    contain few points (and thus small corresponding standard

    deviations along the various dimensions). As more obser-vations are added to the bags in the index, the bag repre-

    sentations become more indicative of what we are likely to

    observe of a particular object in the future, and the accu-

    racy stabilizes. The object-tracking technique we present is

    based upon the visual appearance of an object, and when

    more objects. Even though a FIFO queue is utilized to limit

    the duration of time an object taken into consideration, for

    tracking purposes, when many objects appear in video

    streams simultaneously, the likelihood increases that some

    of the objects will look sufficiently similar that they may be

    mistaken for one another, resulting in decreased accuracy.

    5 Related work

    An LVDBMS encapsulates work from a multitude of

    domains including continuous query languages development,

    Fig. 20 Cross-camera tracking accuracy for sequence 44

    Fig. 21 Cross-camera tracking accuracy for sequence 46

    A. J. Aved, K. A. Hua

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    19/21

    computer vision techniques such as object detection and

    tracking. For completeness, we include a review of recent

    video surveillance related topics.

    5.1 Privacy considerations

    As cameras become pervasive, improved video surveil-

    lance systems will be required to overcome the limitationsimposed by direct and continuous human monitoring. This

    will result in increasing volumes of video that is processed,

    published, monitored, and stored. References [5, 10] sug-

    gest that privacy is a function of what is deemed culturally

    and socially acceptable by society. Several privacy-aware

    systems have been developed which can detect movement

    and mask it for privacy considerations. For example, in

    [27] pedestrians are obscured with multicolored blobs,

    where color specifies a status, such as crossed a virtual

    trip wire. Reference [15] develops an MPEG-4 transcoder

    and decoder to mask objects in a video stream based upon

    movement. While these systems increase privacy bymasking the objects identity, they are not helpful in fight-

    ing crimes because the obscurity is irreversible. Further-

    more, they do not provide functionality to determine if an

    object should indeed be masked in the output video stream.

    Large collections of data provide data mining opportu-

    nities for discovering global trends, decision making,

    capacity planning, building machine learning classifiers,

    etc. Data in its original form, such as hospital patient

    demographic data, contains information that violates indi-

    vidual privacy. Privacy-preserving data publishing (PPDP)

    proposes algorithms to make data available for mining

    global trends while preserving individual privacy. These

    techniques range from monitoring the individual queries

    issued, to perturbing the data in various ways. For example,

    an attacker might try to identify a patients record in a

    public data set.

    The majority of research on privacy control methods

    focuses on statistical databases containing tabular data.

    Security control methods generally entail query restric-

    tions, data perturbation, and output perturbation [2]. Query

    restrictions entail monitoring queries, e.g., the number of

    queries submitted by a particular user, the amount of

    overlapping data that is queried by user, etc. Data pertur-

    bation entails modifying data values stored in the database,

    such as replacing the age values of people with the average

    age by zip code. Output perturbation involves injecting

    error into the query result. Thus, there is a tradeoff between

    accuracy and confidentiality, inducing higher error results

    in a lower likelihood of identifying particular data values

    but results in more skewed aggregated results.

    When privacy filters are applied to video streams, the

    effect is a type of data perturbation. In an ideal scenario,

    the modified streams should not reveal anything about the

    individuals appearing therein [5]. However, [8] has shown

    that an absolute guarantee of privacy is unachievable in the

    presence of external auxiliary information. Some recent

    works, such as [26], investigate identity leakage through

    implicit inference channels, such as time of day combined

    with camera location. For example, if a camera shows an

    office door and one observes a blurred figure entering at 8

    a.m. and leaving at 12 p.m. one can assume the obscuredperson and the person assigned to that office are the same.

    Thwarting this type of attack on privacy is beyond the

    scope of the method we propose in this study. Our primary

    aim is to make objects appearing in a video stream indis-

    tinguishable from one another in accordance with the

    current privacy specification. We note, however, that with

    our framework, identifiers such as office door numbers and

    placards can be defined as static objects and an appropriate

    privacy specification can be defined to redact them from

    the output video stream.

    In this study, we present a flexible privacy framework

    which has the goal of protecting individual privacy whileproviding data streams that can be queried for events as

    accurately as possible. Thus, we choose to perturb the

    output data in some ways (i.e., obscure objects with

    bounding boxes of varying degrees of tightness) but not

    others (such as skewing the video in the time domain,

    adding ghost objects to hide when real ones appear,

    etc.).

    5.2 Object detection and tracking

    There are many existing video surveillance systems over

    networked cameras, e.g. [3, 12]. In particular, object rec-

    ognition and tracking is a core component of these systems,

    forming a basis for high-level analytic functions for scene

    understanding and event detection. Since cameras have a

    limited resolution and field of view, multiple cameras may

    be required to provide coverage over the area of interest.

    Typically, the fields of view of adjacent cameras may not

    overlap due to economics, the environment, or computation

    constraints. These practical factors place a great challenge

    on tracking objects moving across multiple cameras.

    Existing multi-camera tracking environments [14, 18,

    19, 21, 22, 28, 30, 32] require various types of calibrations

    and/or information on the spatial relationships between the

    various cameras, to be configured into the system as known

    parameters. They assume overlapping fields of view of the

    cameras, or non-random movement patterns. In the latter

    scenario, when an object moves from the field of view of a

    camera into that of the next camera, this object can be

    recognized in the second camera by taking into consider-

    ation the speed and trajectory of the object when it exits the

    field of view of the first camera [21, 22]. This strategy is

    only applicable to non-random movement patterns such as

    A general framework for managing and processing live video data

    123

  • 8/2/2019 A General Network for Managing and Processing Live Video Data With Privacy Protection

    20/21

    objects constrained by roads, walls, etc., and cannot be

    used for general-purpose a