MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures,...

7
MarkIt: Privacy Markers for Protecting Visual Secrets Nisarg Raval Department of Computer Science Duke University [email protected] Landon Cox Department of Computer Science Duke University [email protected] Animesh Srivastava Department of Computer Science Duke University [email protected] Ashwin Machanavajjhala Department of Computer Science Duke University [email protected] Kiron Lebeck Department of Computer Science Duke University [email protected] UbiComp ’14, September 13 - 17 2014, Seattle, WA, USA Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3047-3/14/09$15.00. http://dx.doi.org/10.1145/2638728.2641707 Abstract The increasing popularity of wearable devices that continuously capture video, and the prevalence of third-party applications that utilize these feeds have resulted in a new threat to privacy. In many situations, sensitive objects/regions are maliciously (or accidentally) captured in a video frame by third-party applications. However, current solutions do not allow users to specify and enforce fine grained access control over video feeds. In this paper, we describe MarkIt, a computer vision based privacy marker framework, that allows users to specify and enforce fine grained access control over video feeds. We present two example privacy marker systems – PrivateEye and WaveOff. We conclude with a discussion of the computer vision, privacy and systems challenges in building a comprehensive system for fine grained access control over video feeds. Introduction With increasing sophistication in understanding visual inputs and advancements in wearable gadgets, Augmented Reality (AR) has become a reality. Products like Google Glass and Kinect provide platforms where developers can build and publish AR applications. These platforms (e.g., Google Play ) also provide an easy access to download and install such third-party applications. Many third-party

Transcript of MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures,...

Page 1: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

MarkIt: Privacy Markers forProtecting Visual Secrets

Nisarg RavalDepartment of ComputerScienceDuke [email protected]

Landon CoxDepartment of ComputerScienceDuke [email protected]

Animesh SrivastavaDepartment of ComputerScienceDuke [email protected]

Ashwin MachanavajjhalaDepartment of ComputerScienceDuke [email protected]

Kiron LebeckDepartment of ComputerScienceDuke [email protected]

UbiComp ’14, September 13 - 17 2014, Seattle, WA, USACopyright is held by the owner/author(s). Publication rights licensed to ACM.ACM 978-1-4503-3047-3/14/09$15.00.http://dx.doi.org/10.1145/2638728.2641707

AbstractThe increasing popularity of wearable devices thatcontinuously capture video, and the prevalence ofthird-party applications that utilize these feeds haveresulted in a new threat to privacy. In many situations,sensitive objects/regions are maliciously (or accidentally)captured in a video frame by third-party applications.However, current solutions do not allow users to specifyand enforce fine grained access control over video feeds.

In this paper, we describe MarkIt, a computer vision basedprivacy marker framework, that allows users to specify andenforce fine grained access control over video feeds. Wepresent two example privacy marker systems – PrivateEyeand WaveOff. We conclude with a discussion of thecomputer vision, privacy and systems challenges inbuilding a comprehensive system for fine grained accesscontrol over video feeds.

IntroductionWith increasing sophistication in understanding visualinputs and advancements in wearable gadgets, AugmentedReality (AR) has become a reality. Products like GoogleGlass and Kinect provide platforms where developers canbuild and publish AR applications. These platforms (e.g.,Google Play) also provide an easy access to download andinstall such third-party applications. Many third-party

Page 2: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

applications capture camera input and perform someprocessing on it. For example, Word Lens1 - a texttranslation app on Google Glass accesses a real-time videofeed from Glass, detects text in the input, performsrequired translation and projects the translated video backonto the user’s view. Strictly speaking, a translation apponly needs textual content of the video, but instead itaccesses the whole video even when no text is present! Amalicious translation app can easily gather personalinformation about a user underneath the pretense ofproviding translation services. Even if the app is notmalicious, it may inadvertently record personal informationabout the user. For example, a Glass user mayaccidentally share an embarrassing picture while using it.

Related Work Privacy incontext of AR technologieshas gained lot of attentionamong researchers in thepast few years.Jana et al. [7] proposed theRecognizer abstraction forfine grained access control inAR systems. The idea be-hind recognizer is to haveleast privilege access controlfor third-party applications.However, it is unclear howto preserve global properties(e.g., background) using rec-ognizers.Scanner Darkly [8] is aprivacy framework forcontext-aware AR applica-tions. Darkly is integratedwith the popular computervision library OpenCV andprovides a secure interfaceto all third-party applica-tions. Darkly provides accesscontrol at an image levelas opposed to object levelaccess control similar torecognizers.Templeman et al. pro-posed PlaceAvoider [18] sys-tem which alerts user whenan application tries to cap-ture images of sensitive loca-tions (e.g., bathroom).

In order to provide a personalized experience, manyapplications gather sensitive information for user profiling.With Internet of Things [11] and Life Logging camera [6],the extent to which such information can be gathered isunimaginable. The always recording feature of wearabledevices can leak sensitive information like personalpictures, enterprise secrets, health information etc.Although, apps provide some level of functional accesscontrol, the user doesn’t have fine grained control overwhat information is revealed and what information is keptprivate against these third-party apps.

Many privacy preserving techniques have been proposed inthe context of AR technologies (see sidebar for a detaileddescription of related work). Most of these techniquesinvolve a privacy framework with exclusive access tosensor input. All third-party applications request sensordata through the privacy framework which in turn allowsthese requests based on predefined access control policies.Current models for such a privacy framework can be

1http://questvisual.com

classified into two categories. The “least privilege”approach allows an application to access only thoseinformation which are absolutely required for itsfunctionality [7, 8]. Usually, an application requests anobject of interest. Then, the privacy framework informsuser about potential information leakage and prompts forher permission. The application is given access to theobject only if user permits.

The “least privilege” approach works very well when theapplication needs are clearly defined and confined tospecific objects (e.g., location, face etc.). However, thisapproach fails when application needs are eitherunspecified or very broad (e.g., background). Forexample, a navigation app like iOnRoad2 may requiregeneral properties of an image to detect roads, lanes,parking spots, etc. To handle such scenarios, anotherapproach “protecting secrets” is proposed in which a setof secrets (sensitive objects) are defined and are protectedby the privacy framework [2, 3, 17]. Most of the proposedsystems in this category protect only predefined sensitiveobjects. However, it is infeasible to construct anexhaustive list of potential sensitive objects. Also, the listof sensitive objects differ from person to person. Toovercome these limitations, we propose the MarkIt privacyframework whereby users can specify arbitrary secrets.

The novelty of our framework is in the use of privacymarkers that allow users to specify an arbitrary sensitiveregion or object in the video feed. We describe twoexample markers – special rectangle to protect secrets ona two dimensional surfaces (e.g., white board) and virtualbounding box drawn using hand gestures to protect threedimensional objects. The high level idea is to continuouslylook for privacy markers in an incoming video feed. Once,

2http://www.ionroad.com

Page 3: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

the marker is detected, an object enclosed by the markeris considered secret and removed from the current as wellas subsequent frames before passing it to any third-partyapplication. The aim is to provide entire raw image exceptsensitive objects to third-party applications. Thus, itallows an application to extract as much information aspossible from an image without disclosing sensitiveobjects/regions.

Chaudhari et al. [2] used real-time audio distortion and vi-sual blocking to protect pri-vacy of subjects in a video.A similar problem was solvedby Schiff et al. [17] usingcolor based visual markersto detect and block faces.Enev et al. [3] proposed anovel approach of transform-ing raw sensor data suchthat the transformed out-put minimizes the exposureof user defined private at-tributes while maximally ex-posing application definedpublic attributes.Recently, Roesner et al. [14]introduced new security andprivacy challenges whichneed to be addressed in thecontext of AR technolo-gies. They characterizedthese challenges along twoaxes - system scope andfunctionality.Roesner et al. [15] arguedthat access control policiesshould be specified by theobjects themselves in con-tinuous sensing platforms.They also proposed a generalframework for world-drivenaccess control of sensor data.

We make the following contribution in this paper. First,we enlist the desired properties of privacy markers. Then,using the marker approach, we propose two conceptualprivacy systems suitable for two different use cases.Finally, we discuss computer vision, privacy and systemschallenges in building a comprehensive privacy frameworkfor fine grained access control on video feeds.

Privacy MarkersIn this section, we briefly describe our ongoing work onprivacy markers to specify and enforce privacy of visualsecrets. We assume that the AR platform is trusted, butthe applications that access the visual feed may bemalicious. Our privacy specification and enforcementalgorithm execute at the system level in the AR platform,while applications execute in user mode. This strategyallows us to enforce privacy before sending the video feedsto third-party applications.

The high level idea is to mark sensitive objects in realworld using specialized privacy markers that can beefficiently detected by the AR platform in real-time. Oncedetected, the device blocks the sensitive object enclosedby the marker before passing the raw input image tothird-party applications. These sensitive objects areprotected throughout the subsequent frames. Thus, thesystem provides privacy guarantee against malicious

third-party applications as well as accidental recordings.For an accurate and efficient implementation of thesystem, the privacy markers should possess the followingproperties.

• Uniquely identify and localize arbitrary objects ofinterest.

• Accurate and real-time detection by AR devices.• Unambiguously encode a wide range of privacy

policies.• Easy to create/remove and cost effective (e.g.,

drawing by hand, digital tools or stickers).• Invariant to occlusion, illumination and viewing

angles.

Two possible options for privacy markers include QuickResponse (QR) codes [1] and Radio FrequencyIdentification (RFID) tags [16]. However, they are notsuitable for tagging visual secrets for the followingreasons: Detecting a QR code and decoding relevantinformation depends upon uncontrolled aspects such asclarity of the QR code in the camera view of an adversary.The problem with RFID tags is that their performance isinfluenced by metallic objects around them [9]. Also, itrequires pre-installed RFID readers on the devices.

Rather than depending on additional infrastructure, weshow two examples of how simple human cues coupledwith off the shelf computer vision algorithms can be usedto specify and enforce privacy of visual secrets. Weconsider two popular scenarios of privacy breach andbriefly discuss potential privacy markers for them.Although, these markers can encode various access controlpolicies, in this paper we use the default policy of blockingthe entire region enclosed by the marker. In future, we areplanning to extend our markers to encode various accesscontrol policies.

Page 4: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

PrivateEyeConsider a scenario where a sensitive product idea is beingdiscussed on a white board in an enterprise environment.Imagine an employee walking in with a Goggle Glass. Even

(a) Input Image

(b) Output Image

Figure 1: Conceptual View ofPrivateEye: The area enclosed bythe marker (dotted rectanglewithin solid rectangle) isconsidered secret and blocked bythe system.

though, an employee may be trusted, its hard to trustthird-party applications running on Glass. This may leadto malicious or accidental leaks of confidential informationwritten on the white board. In order to prevent such leakswe need a mechanism such that no private information(white board content) can be accessed by third-partyapplications without prior authorization from the contentowner. This mechanism also needs some means to defineprivate information (area on a white board). We develop asimple privacy marker “dotted rectangle within a solidrectangle” to mark two dimensional area on a white board(or presentation slides). Any content enclosed within thismarker is considered private by the system. This markertogether with the algorithm for recognizing it result intoour first solution PrivateEye.

With PrivateEye, we take the first step towards building asystem that can provide privacy to two dimensionalregions against prying digital eyes. PrivateEye consists oftwo pieces: (1) a specification for marking atwo-dimensional space as secret, and (2) software on arecording device for recognizing markings and obscuringvisual secrets in real-time. We investigate the possibilityof using computer vision algorithms to identify visuallysensitive information and process the visual data inreal-time. Figure 1 shows examples of how PrivateEyeusers can define a region containing visual secrets bycombining solid and dotted lines. Depending on themedium, users can define secret regions by hand (e.g., ona white board) or use digital tools (e.g., within apresentation). When a sensitive visual information is inthe camera view, the PrivateEye framework denies access

of that region to any third-party applications.

WaveOffThe scope of secrets protected by PrivateEye markers islimited to two dimensional surfaces like white board, wallpainting, presentation, etc. In many cases users may wantto protect three dimensional objects. One such use case isto protect a keyboard while typing password. A maliciousglassware can easily launch shoulder surfing attack just byrecording the keyboard while user is typing her password.Blocking the entire view or turning off the Glass duringpassword-entry may not be a good solution as the viewmight be useful for some applications. For example, onecan think of a password manager which can superimposespecial indicators over the correct keys to help user withthe complex password [14].

We propose a novel approach of using hand gestures tomark real world sensitive objects in real-time. Using theproposed system WaveOff, user can draw a virtualbounding box in the air enclosing a sensitive object (e.g.,keyboard). The system detects the virtual bounding boxand marks the enclosed object as sensitive. Henceforth,that specific object will be blocked in all the consecutiveframes. WaveOff only protects the sensitive object as longas it is in the field of view. Once, the object goes out ofthe camera view it is no longer protected. The conceptualview of WaveOff is outlined in Figure 2. WaveOff heavilyuses computer vision algorithms for various tasks likedetecting hand gestures, tracking sensitive objects, etc.

ChallengesIn order to successfully implement proposed markersystem, we need to address many challenges spanningacross various domains. Broadly, we categorize them intothree areas - computer vision, privacy and systems

Page 5: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

challenges. In this section, we briefly outline thosechallenges and pointers to address them. Even though,our paper focuses on marker system, most of these issuesare applicable to any system that protects visual secrets.

Computer Vision Challenges

(a) Marking Sensitive Object

(b) Blocking Marked Object

Figure 2: Conceptual View ofWaveOff: User uses handmovements to draw virtualbounding box. The objectenclosed by the bounding box isprotected throughout all thesubsequent frames.

The key component of the proposed marker system isidentifying sensitive objects (or markers) in the inputcamera feed in real-time. This task involves threeimportant computer vision techniques - object detection,recognition and tracking. The appearance of an objectchanges a lot with variation in illumination, poses, views,etc. This high variation in appearance makes bothdetection and recognition hard to solve. Even state of theart computer vision algorithms fail to achieve highaccuracy [4].

With the help of privacy markers, we are able to reducethe harder problem of detecting an arbitrary object intomuch simpler problem of detecting a specific object(marker). Furthermore, one can design a marker(irrespective of the sensitive objects) which is easy todetect using current object detection algorithms.However, the same technique can not be applied in caseof recognition because even with the fixed object,recognition in wild is an extremely hard problem. If werestrict our system to only protect sensitive objects aslong as they are in the field of view, then we can avoidrecognition. However, if we want the system to rememberpreviously marked sensitive objects then we need to trainour system to recognize objects.

Lastly, object tracking also faces various challenges likedrifting, occlusion, etc. Some of the current visionalgorithms attempt to solve these issues to certain extent.For example, Tracking-Learning-Detection (TLD) [10]simultaneously tracks the object, learns its appearance

and detects it whenever it appears in the video.

Privacy ChallengesAs explained in the previous section, the proposed systemheavily uses various computer vision algorithms. Hence,the privacy guarantees of such a system highly depends onthe accuracy of the corresponding computer visionalgorithms. Although state of the art vision algorithms arenowhere near perfect, we argue that even with a 99%accurate computer vision algorithm, it is hard to providean arbitrary privacy guarantee. To understand this,consider a simple use case of face detection. To preserve aperson’s identity, we blur all faces in the input videostream. More specifically, given a video stream, we runface detection algorithm on every frame and blur everyface detected by the algorithm. Considering a videocapturing rate of 25 fps, a person’s face appearing forapproximately 7 minutes, appears in more than 10000consecutive frames. Even with a 99% accurate facedetection algorithm, the probability that we miss a face inone of these 10000 frames is significantly large (assuminguniform failure rate). Revealing a face in even one framereveals the identity of that person in the entire video.Although the above argument is ad-hoc as it doesn’t takeinto account the fact that adjacent frames are correlated(face usually exist in consecutive frames), the argumentthat achieving arbitrary privacy guarantee is infeasibleeven with a near perfect vision algorithm is still valid.

In some cases, revealing a face in a single frame might bea breach of privacy; for example, when we want to keepthe presence of a person secret. However, in other cases,revealing a face in a few frames might be acceptable. Oneexample of such a scenario is lip reading from a video.Even if we reveal few non consecutive frames with faces,it is hard to identify spoken words. Intuitively, we can

Page 6: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

pose a question like “Which (or How many) frames needto be revealed in order to leak a specific secret?”. Theanswer to this question (span of frames or extent) can beused to characterize the secret. For example, if a personappears throughout the video, then revealing any framereveals the existence of that person. Hence, the extent ofthe secret existence is very high. On the other hand, oneneeds to reveal lot of consecutive frames in order to inferspoken words. Thus, the extent of lip reading is relativelylow. It is an interesting research direction to categorizevisual secrets based on their spatial as well as temporalextent and analyze its relationship with privacy guarantee.

One of the distinctive features of visual secrets is thatthey are probabilistic as opposed to traditional resourceswhich are deterministic in nature. Hence, we need aprobabilistic access control mechanism to deal with visualsecrets. Apart from their probabilistic nature, visualsecrets are highly correlated. For example, if an objectappears in a particular frame of a video, then with highprobability the same object will exist in subsequentconsecutive frames. Rastogi et al. [13] proposed an accesscontrol mechanism for probabilistic databases. However,we need a novel approach in designing access controlmechanisms that account for both uncertainty andcorrelation among visual secrets.

Systems ChallengesMost of the AR devices available today are not equippedwith high computing power3. On the other hand, manyrequired vision algorithms are computationally intensiveand sometimes need large storage (e.g., training) as well.Thus, the question is how to perform real-timecomputation over limited computational resources

3Note, that future AR devices may possess high computing powerbut the corresponding algorithmic complexity will also increase.

provided by AR devices? The most popular solution tothis problem is offloading. The idea is to offload heavycomputation on server/cloud infrastructure [5, 7, 8, 12].However, this poses several other challenges likeprivacy/security issues on cloud, trade-off betweencomputing time and network bandwidth, etc. One needsto carefully think about these issues in the context ofavailable resources.

The need for real-time computation stem from therequirements of a typical AR application. Many ARapplications require instantaneous access to the capturedvideo feed (e.g., Word Lens). However, certainapplications like photo sharing need not get instantaneousinput from camera. It may be acceptable to introduce aminor delay to pre-process the video feed in order toremove sensitive objects. With more computing time oneshould get better privacy guarantee as the input imagecan be thoroughly checked for sensitive objects. Thereseems to exist a trade-off between computing time andprivacy. We believe that the categorization of applicationsalong the axes of delay-tolerance will lead to betterunderstanding of this trade-off. This will in turn allow usto provide application specific privacy guarantees.

ConclusionWe proposed a novel approach of privacy markers thatenables users to define arbitrary sensitive objects in thereal world. We also presented two ongoing projectsPrivateEye and WaveOff to explain how privacy markerscan be used to protect visual secrets. We note that theproposed system is not a complete solution. We discusseda number of computer vision, privacy and systemschallenges that need to be addressed to realize this vision,and hope this discussion fuel interesting interdisciplinaryresearch in the future.

Page 7: MarkIt: Privacy Markers for Protecting Visual Secretsnisarg/pub/upside14.pdf · pictures, enterprise secrets, health information etc. Although, apps provide some level of functional

References[1] Belussi, L. F., and Hirata, N. S. Fast

component-based qr code detection in arbitrarilyacquired images. J. Math. Imaging Vis. (2013).

[2] Chaudhari, J., Cheung, S., and Venkatesh, M.Privacy protection for life-log video. In SAFE (2007).

[3] Enev, M., Jung, J., Bo, L., Ren, X., and Kohno, T.Sensorsift: Balancing sensor data privacy and utilityin automated face understanding. ACSAC (2012).

[4] Everingham, M., Van Gool, L., Williams, C. K. I.,Winn, J., and Zisserman, A. The PASCAL VisualObject Classes Challenge 2012 (VOC2012) Results.http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

[5] Ha, K., Chen, Z., Hu, W., Richter, W., Pillai, P., andSatyanarayanan, M. Towards wearable cognitiveassistance. MobiSys (2014).

[6] Hodges, S., Williams, L., Berry, E., Izadi, S.,Srinivasan, J., Butler, A., Smyth, G., Kapur, N., andWood, K. Sensecam: A retrospective memory aid.UbiComp (2006).

[7] Jana, S., Molnar, D., Moshchuk, A., Dunn, A.,Livshits, B., Wang, H. J., and Ofek, E. EnablingFine-Grained Permissions for Augmented RealityApplications With Recognizers. In USENIX Security(2013).

[8] Jana, S., Narayanan, A., and Shmatikov, V. AScanner Darkly: Protecting User Privacy fromPerceptual Applications. In S & P (2013).

[9] Juels, A. Rfid security and privacy: a research survey.

Selected Areas in Communications, IEEE Journal on(2006).

[10] Kalal, Z., Mikolajczyk, K., and Matas, J.Tracking-learning-detection. PAMI (2012).

[11] Metz, R. More connected homes, more problems.MIT Technology Review (2013).

[12] Ra, M.-R., Sheth, A., Mummert, L., Pillai, P.,Wetherall, D., and Govindan, R. Odessa: Enablinginteractive perception applications on mobile devices.MobiSys (2011).

[13] Rastogi, V., Suciu, D., and Welbourne, E. Accesscontrol over uncertain data. VLDB (2008).

[14] Roesner, F., Kohno, T., and Molnar, D. Security andprivacy for augmented reality systems. Commun.ACM (2014).

[15] Roesner, F., Molnar, D., Moshchuk, A., Kohno, T.,and Wang, H. J. World-driven access control forcontinuous sensing. Tech. Rep. MSR-TR-2014-67,2014.

[16] Sample, A., Macomber, C., Jiang, L.-T., and Smith,J. Optical localization of passive uhf rfid tags withintegrated leds. In RFID (2012).

[17] Schiff, J., Meingast, M., Mulligan, D., Sastry, S., andGoldberg, K. Respectful cameras: detecting visualmarkers in real-time to address privacy concerns. InIROS (2007).

[18] Templeman, R., Korayem, M., Crandall, D., andKapadia, A. PlaceAvoider: Steering first-personcameras away from sensitive spaces. In NDSS (2014).