Public Restroom Detection on Mobile Phone via Active Probingmfan/papers/iswc2014-fan.pdf · Public...

8
Public Restroom Detection on Mobile Phone via Active Probing Mingming Fan, Alexander Travis Adams, Khai N. Truong Department of Software and Information Systems University of North Carolina, Charlotte {mfan, aadams85, ktruong8}@uncc.edu ABSTRACT Although there are clear benefits to automatic image capture services by wearable devices, image capture sometimes happens in sensitive spaces where camera use is not appropriate. In this paper, we tackle this problem by focusing on detecting when the user of a wearable device is located in a specific type of private space—the public restroom—so that the image capture can be disabled. We present an infrastructure-independent method that uses just the microphone and the speaker on a commodity mobile phone. Our method actively probes the environment by playing a 0.1 seconds sine wave sweep sound and then analyzes the impulse response (IR) by extracting MFCCs features. These features are then used to train an SVM model. Our evaluation results show that we can train a general restroom model which is able to recognize new restrooms. We demonstrate that this approach works on different phone hardware. Furthermore, the volume levels, occupancy and presence of other sounds do not affect recognition in significant ways. We discuss three types of errors that the prediction model has and evaluate two proposed smoothing algorithms for improving recognition. Author Keywords Restroom detection, room impulse response, active probing, pattern recognition ACM Classification Keywords H.5.5. [Sound and music computing]: Signal analysis, synthesis, and processing INTRODUCTION Wearable cameras produce personal image-based records which can be used in a variety of ways. For example, researchers have used such records to investigate health behaviors (such as exercise and diet [9, 10]), help people with memory loss recall past events [8], increase parental understanding of the needs of children with autism [15, 18], and improve everyday memory and social skills for children with disabilities [1]. Although there are demonstrated benefits to wearing an always-on and automatically recording personal camera, there are also documented concerns of recording others, particularly in sensitive spaces [1, 3, 4, 9, 10, 11]. As a result, many researchers and users have expressed a need for a mechanism to temporarily disable capture. However, even when manual “privacy buttons exist and wearable cameras can be removed, it is not uncommon for participants to report that they forgot they were wearing the unit. Therefore, the participant inadvertently might collect inappropriate images, such as going to the bathroom” [11]. “With thousands of images automatically recorded every day, … [the user] only deletes unwanted images if he comes across them, as searching for them would take too much time” [4]. Therefore, how to turn off wearable cameras automatically in sensitive or private spaces is an important research problem. We tackle this problem by exploring how to detect a specific type of private space where image recording is socially inappropriate—the public restroom. Many researchers have identified the restroom as a specific type of space where they want to suspend capture (e.g., [1, 3, 4, 11]). We focus on public restrooms, in particular, because of the potential for others to be recorded in the captured images there. This problem is challenging for two reasons. First, infrastructure-dependent indoor localization approaches (e.g., cellular, WiFi, and visible light) depend on the infrastructure coverage and floor maps to identify a restroom’s location. Infrastructure-independent indoor localization approaches (e.g., inertial sensors on phone) would still require floor maps in order to reason and determine if the user’s location is inside a restroom. However, such localization methods fail when the user is outside of an infrastructure’s coverage or at a location where floor maps have not yet been developed. Alternatively, video or image based approaches can be employed to detect restrooms [17, 19, 20, 25, 26]. Unfortunately, vision based techniques can sometimes miss signage located immediately outside the space [26]. These methods still can be used inside the space to detect the presence of objects, construction material, and fixtures commonly found in restrooms to reason that must be where the user is located (e.g., [19]); however, this violates the original motivation of not wanting recording to happen there in the first place. Furthermore, recording must still be on to determine when the user has left the space in order to resume the archiving of captured images. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ISWC'14, September 13 - 17 2014, Seattle, WA, USA Copyright 2014 ACM 978-1-4503-2969-9/14/09...$15.00. http://dx.doi.org/10.1145/2634317.2634320 27 ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA

Transcript of Public Restroom Detection on Mobile Phone via Active Probingmfan/papers/iswc2014-fan.pdf · Public...

Public Restroom Detection on Mobile Phone via Active Probing Mingming Fan, Alexander Travis Adams, Khai N. Truong

Department of Software and Information Systems University of North Carolina, Charlotte {mfan, aadams85, ktruong8}@uncc.edu

ABSTRACT Although there are clear benefits to automatic image capture services by wearable devices, image capture sometimes happens in sensitive spaces where camera use is not appropriate. In this paper, we tackle this problem by focusing on detecting when the user of a wearable device is located in a specific type of private space—the public restroom—so that the image capture can be disabled. We present an infrastructure-independent method that uses just the microphone and the speaker on a commodity mobile phone. Our method actively probes the environment by playing a 0.1 seconds sine wave sweep sound and then analyzes the impulse response (IR) by extracting MFCCs features. These features are then used to train an SVM model. Our evaluation results show that we can train a general restroom model which is able to recognize new restrooms. We demonstrate that this approach works on different phone hardware. Furthermore, the volume levels, occupancy and presence of other sounds do not affect recognition in significant ways. We discuss three types of errors that the prediction model has and evaluate two proposed smoothing algorithms for improving recognition.

Author Keywords Restroom detection, room impulse response, active probing, pattern recognition

ACM Classification Keywords H.5.5. [Sound and music computing]: Signal analysis, synthesis, and processing

INTRODUCTION Wearable cameras produce personal image-based records which can be used in a variety of ways. For example, researchers have used such records to investigate health behaviors (such as exercise and diet [9, 10]), help people with memory loss recall past events [8], increase parental understanding of the needs of children with autism [15, 18], and improve everyday memory and social skills for children with disabilities [1]. Although there are demonstrated

benefits to wearing an always-on and automatically recording personal camera, there are also documented concerns of recording others, particularly in sensitive spaces [1, 3, 4, 9, 10, 11]. As a result, many researchers and users have expressed a need for a mechanism to temporarily disable capture. However, even when manual “privacy buttons exist and wearable cameras can be removed, it is not uncommon for participants to report that they forgot they were wearing the unit. Therefore, the participant inadvertently might collect inappropriate images, such as going to the bathroom” [11]. “With thousands of images automatically recorded every day, … [the user] only deletes unwanted images if he comes across them, as searching for them would take too much time” [4]. Therefore, how to turn off wearable cameras automatically in sensitive or private spaces is an important research problem.

We tackle this problem by exploring how to detect a specific type of private space where image recording is socially inappropriate—the public restroom. Many researchers have identified the restroom as a specific type of space where they want to suspend capture (e.g., [1, 3, 4, 11]). We focus on public restrooms, in particular, because of the potential for others to be recorded in the captured images there. This problem is challenging for two reasons. First, infrastructure-dependent indoor localization approaches (e.g., cellular, WiFi, and visible light) depend on the infrastructure coverage and floor maps to identify a restroom’s location. Infrastructure-independent indoor localization approaches (e.g., inertial sensors on phone) would still require floor maps in order to reason and determine if the user’s location is inside a restroom. However, such localization methods fail when the user is outside of an infrastructure’s coverage or at a location where floor maps have not yet been developed. Alternatively, video or image based approaches can be employed to detect restrooms [17, 19, 20, 25, 26]. Unfortunately, vision based techniques can sometimes miss signage located immediately outside the space [26]. These methods still can be used inside the space to detect the presence of objects, construction material, and fixtures commonly found in restrooms to reason that must be where the user is located (e.g., [19]); however, this violates the original motivation of not wanting recording to happen there in the first place. Furthermore, recording must still be on to determine when the user has left the space in order to resume the archiving of captured images.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. Copyrights forcomponents of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, topost on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ISWC'14, September 13 - 17 2014, Seattle, WA, USA Copyright 2014 ACM 978-1-4503-2969-9/14/09...$15.00. http://dx.doi.org/10.1145/2634317.2634320

27

ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA

Iacaurlotrsdsovpptys

TAae[wsmsasafctuatsce

Tc(c

In this paper, approach, whicommodity moanalyze the imuses active prorather than a ocations fromrained. We de

spaces and thatdiscuss how thsignificant waobtrusiveness volume. We prediction perfpresence of otypes of errors

smoothing algo

THEORY OF OAn important aand characterienvironments a24]. IRs conta

which is the resound, decay timethod for capsweep over a pacoustics of aspecified amouanalyzed to ufrequency ovecommonly usedune their syste

any undesired echnique as an

space. IRs are acreate a digitalenvironments.

The acoustic contingent on (absorption coconstant factor

Figure 1. (Left)

we present ich uses the obile phones t

mpulse responsobing to detec

specific spacm which a roemonstrate that the approach whe volume leveays. Thereforof the sweep also show thformance despther sounds in that our SVM

orithms to impr

OPERATION and commonlyize the acouand materials ain time-domaeverberation anme, early refle

pturing the IR opredetermined a room [14]. Tunt of time aftunderstand ther time in an d by audio engems in order to

feedback or n analysis tool also used in col representation

characteristicits dimensions

oefficient). Thrs including t

Wearable phone

an infrastructhardware alr

to emit a prose (IR) of a sct restrooms (ace amongst a om type clas

at our model cworks on diffe

el does not affere we can

by outputtinghat our modepite of the occn restrooms. WM model has arove the predic

y used measureustic propertieis the Impulse

ain acoustic prnd its compon

ection, and echof a room is tofrequency ran

The sine wavfterwards is rece behavior o

environment.gineers to help o enhance the s

noise. Acousto aid them w

onvolution reven of the acous

cs of an ens and ability t

his is a functhe shape, siz

e attachment for

ture-independeready found obing sound anpace. Our woa type of spac

defined set ssifier has becan predict neerent phones. Wfect the model

minimize thg it at a lowel maintains icupancy and th

We discuss thrand propose twction accuracy.

ement to analyes of differee Response (IRroperties, one nents (i.e., direoes). A commo use a sine wav

nge to excite thve sweep and corded and th

of each audibIR analysis

them design ansound and avosticians use thwhen designingerb processing stics of differe

nvironment ato absorb soun

ction of severze, constructio

r performing anduse

ent on nd

ork ce) of en ew We

in he

wer its he

ree wo

yze ent R) of

ect on ve he

a en

ble is

nd oid his g a to

ent

are nd ral on

materiadesks).and pecontinuaffect paramereflecti

Becaushave Howevdue touniquehas strserve th

In bothaffordaThese toilets,privatesimilarrestroothe phorestroothe surhave si

Our syspaces acoustiprocesto trainuser isexistinmethodenvironnoise o21]. Aclassifibegins and han

d analyzing impued in our study.

als, and object. There are al

ercentage of ocuously over t

reverberationeters of soundion [23].

se no two enviunique acou

ver, many cono its purpose. e qualities fromrong correlatiothe same purpo

h the public anances that greaffordances in

, and sinks. We restrooms havr acoustic reoms from that one). This is p

oms but can alsrfaces and theimilar absorpti

ystem leverage(in particular

ics via IR. Thsed to extract n a classifier ts located. Thisng computationds which extrnment or humor sink usage) An active p

fication of restrto use the spa

and washing).

ulse response and

ts inside of thlso many variaccupancy, that time. These

n time, but d such as dif

ironments are eustic charactenstructs have v

One particulam other types oons between siose, is the restro

nd private spaceatly impact tnclude water re

While public reve showers/tubesponses and response (eve

partially due tso be attributede items found ion coefficients

es the natural ar the restroom)he IR of diffeacoustic featu

to identify the s active probinnal auditory scract features a

man generated collected fromprobing apprtrooms to happace (e.g., urinat

d image capture

he construct (eables, such ascan change thcharacteristics they also af

ffraction, refra

exactly the sameristics or fivery similar far environmenof environmenimilar environoom.

e, restrooms hathe acoustic fesistant floors estrooms havebs, they both d

can be ideen to the humato the commond to the materiain a restroom s [14]).

acoustic traits o) by exciting t

erent spaces caures which are

type of spaceng approach dicene recognitioand trains cla

sounds (suchm the target sceroach allows pen even beforting, defecating

; (Right) 8 samp

e.g., chairs, s humidity, e acoustics

not only ffect other action, and

me, they all ingerprints. fingerprints nt that has ts, yet also

nments that

ave similar fingerprint. and walls,

e stalls and emonstrate

entified as an ear over n layout of als used on (which all

of different the natural an then be e then used where the iffers from on (CASR) assifiers on h as traffic enes [2, 16,

for the re the user g, flushing,

ple restrooms

28

ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA

A[moBRlouSsvth

S

ImTapam2woththsa

F

Active probing13,22]. Kunze

mobile phone’sones by using vBy emitting a Rossi et al. [22ocation among

use a Sine WSequence (MLsweep exhibitsvariance of thehe goal of reco

SYSTEM IMPL

mpulse RespoTo perform theand the speakephones (see Fapplication thamicrophone, an20Hz – 20kHzwave sweep stone second, allhe measured she IR at 44.1 K

spectrograms oa restroom and

Figure 2. (a) Imp

has been exple and Lukowiczs symbolic locavibration and shsound and an2] were able tg 20 rooms at 9

Wave Sweep iLS) for active s better tolerane probed spaceognizing space

LEMENTATION

onse Measuree IR measuremer already avaFigure 1). Wat first starts nd then it outpz from the butops, the appliowing it to capspace. The meKHz with 16 b

of IRs collecteda non-restroom

(

(pulse response o

of an

ored as a localz [13] showedation from a sehort narrow fre

nalyzing the imto classify a u98% accuracy. instead of Maprobing, becaunce to nonline

es [6]. This allotype.

N

ement ment, we use ailable on com

We developed recording w

puts a sine wauilt-in speakercation continupture fully the easurement appbit depth. Figud on a Nexus 4m space.

(a)

(b) of a restroom; (b)n office.

ization approad how to detectet of pre-definequency ‘beeps

mpulse responsser’s room-levIn our work, w

aximum Lenguse a sine wavearity and timows us to tack

the microphonmmodity mobia measureme

with the built-ave sweep fro. After the sin

ues to record freverberation

plication recorure 2 shows ho phone differ f

) Impulse respon

ch t a ed s’. se, vel we gth ve

me-kle

ne ile ent -in om ne for of

rds ow for

Stans emeasurTime-Scompachoice not reqcreatinhave vunoccuoptimaFurthertypicalmatchesweephumanable tosweeps0.1s duacousti

FeaturThe mbefore after than IRmeasuris a vapplicaand whstarts asweep:size Wtwo wieach wFFT m

(K1…N, frames∗∗of thProcessweep Howevbecausat diffeoptimaperformitself +sectionthrough

For thefor feacoefficcommo

nse

et al. previouslrements (MLSStretched Pulsarison, they fou

for non-occupquiring any tedng a robust movery few peopupied. This enally, making it rmore, a comlly optimized fes the ideal fr

to capture IRns, we want it o produce a s of 0.01, 0.1 uration providics of a space a

re Extractionmeasurement ap

it outputs the he sweep has c

R. The IR isrement applicavariable latenation requests hen it actuallyat an unknown: 1) Divide the

W = 1024 samindows have 5

window using tmagnitudes forK = 0,…,W-1)

N:# of frames S inside the. % . 4) Estimat

he sweep u∈ … ∑ssing the rest

guarantees ver, it may se the IRs captferent rates. Wal amount of tm feature extr+ 0.3 s aftern, we describe h tests of diffe

e identified opature extractioncients (MFCConly used as f

ly compared diS, Inverse Reses, and Sine und that the sinupied spaces. Idious calibratioobile system) [2ple coming an

nables the sine t the best optiommodity mobifor the range orequency rangeRs. Because tto be as shornoticeable proand 1s in dur

des the best band minimizing

pplication intensweep and sto

completed to fpresent in t

ation begins toncy between w

the OS to outpy plays. Due t

n time point. Toe IR into frame

mples (23 milli0% overlap withe Hanning fur each windowFFT magnitu

es in a sweep e sweep itsel

ate the index to

using the f∑ ,of the recordithat the comalso process tured in differe

We have foundthe recording raction on is 0r the sweep s

the process foerent durations

ptimal amountn, we extract thCs) from eachfeatures in the

ifferent impulsepeated Sequen

Wave Sweepne wave sweept also has the

ons (which is im24]. Restroomnd going, and

wave sweep on to use for IRile phone’s haof human heare for using a the sweep is rt as possible yobing effect. ration and founalance for capg the intrusiven

ntionally startsops recording ofully capture ththe recording

o output the swwhen the meput the sine wto this delay, o estimate the es using a slidiniseconds). Eacith each other.

function. 3) Caw. , repr

ude of the winrecord). The

lf is calculate

o the window

following op, .

ing after the smplete IR is

extraneous inent types of spd experimentalafter the swee

0.4 s (0.1 s of tops). In the or determiningafter the start o

t of the recordhe mel-frequenh window. Me speech recog

se response nce (IRS), p). In this

p is an ideal benefit of

mportant in s generally

d are often to perform R analysis. ardware is ring, which sine wave audible to yet still be We tested nd that the pturing the ness.

s recording one second he effect of

after the weep. There easurement

wave sweep the sweep start of the ng window ch adjacent 2) Smooth

alculate the resents the

ndow j (j = number of

ed as:

at the start

ptimization:

start of the analyzed.

nformation aces decay ly that the

ep starts to f the sweep

evaluation g this value of sweep.

ding to use ncy cepstral MFCCs are gnition and

29

SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES

cimaufc

u

o

CCgCaHthacpr

Bcm(cmcorbbtyL

PUr1baaciresor

Fpbthruah

computational mplementation

applying 23 Muse the rest to final step, we acalculating the

using the follow

of frames in the

Classification Classifying resgenerally referClassification, are interested iHowever, one-heir prediction

algorithms (Hclassification inpredict a new srecalls for the r

Because restrooconform to bmaterials, and h(e.g., toilets, wconsistency in might be highlclass. Furthermoutside of restrrestroom data tboundary. Thusbinary-class clype as either

LibSVM [5] fo

PERFORMANCUsing a Galaxyrestrooms from1960s – 2006)boundary, weattempted to saas possible (e.gclassroom, shops hardly poss

restroom spaceevaluation. Infshown in the fiof places wherestroom data.

For each spaceperceived to subuilding architehrough and int

restroom, the tyurinals / toilet and then back held the phone

auditory scenen, we calcula

Mel filters, remform a feature

aggregate all the mean values

wing equation:

e optimal amou

trooms vs. nonrred in machbecause restro

in identifying a-class classifiens. We have triHempstalk et

n LibSVM [5sample as “unkrestroom class.

oms are built tbuilding codehave similar mwash basins), the restroom ly distinguisha

more, because prooms, we can to help the clas, we decided tlassification, wr restroom orr classification

CE EVALUATIy Nexus, we com 49 different. To help the

also collectample as diversg., hallway, elp, and bus statsible to coveres that a userformation aboirst row of Tabere we collec

e, we collectedupport circulaecture to referteract with the ypical circulatistalls, the wasto the door. W

e still in one

e recognition [2ate the first ove the first M

e vector: , ,he MFCCs of as of each one

: ∑ ,

unt of the recor

n-restrooms falhine learning oom is the targamong all the

ers tend to be ied two one-cla

al. [7] and5]), and found known” and the

o serve the sames, use simil

materials insidethere should

class, which aable from the people spend measily collect a

assifier learn tto treat restroowhich predictr non-restroomn in our evaluat

ON ollected IR dat

buildings (bumodel learn tted non-restrose a set of non-levator, lockertion). Howeverr all differentr can potentiaout our colleble 1. Figure 3 cted the restr

d 30 samples aation. Circulatir to the way thspace. For instion would be fshstand, the paWhile collectihand in front

2, 16, 21]. In o13 MFCCs b

MFCC, and th, … , , . In thall the frames b

of the MFCC

(k=1…12, N:

rding).

lls under what as One Cla

get class that wpossible spaceconservative

ass classificatiod the one-cla

that they ofterefore yield lo

me functionalitlar constructioe of the construd be high innat the same tim

“non-restroommuch more tima variety of nothe classificatioom detection asts current room. We leveragtion.

ta for 103 publuilt between ththe classificatiooom data. W-restroom spacr room, outdoor, we note thatt types of noally visit in oected dataset

shows the typroom and no

at different spoion is a term hat people movtance, in a menfrom the door

aper towel racking the data, w

of the chest

our by en he by Cs

: #

is ass we es. in on ass en

ow

ty, on uct ner me m” me on-on s a om ge

lic he on

We ces or, t it on-our

is pes on-

ots in ve

n’s to

ks, we to

simularemainWe tooand thanyway

OptimaAs preslightlyThis grecordithe staextractstart orecordiextractinterva(0.1~1featureand peclassifiperformduratioof the that moare leswe useon the the swsweep

EfficacTo assspacesnumbeUsing

Figu

Figure 4duration

ate wearing thened stationary ok care to holdhe microphony.

al Amount of eviously descry before the sguarantees thaing. We can reart of the recoting features fr

of the recordining after the tion, we selecal after the sw s). For each d

es based on all erformed a 10

fier. The resultmance measuon after the swrecording to u

models trained wss accurate. Thed this finding 0.4 seconds po

weep (0.1 s swhas stopped).

cy of Model insess the model’, we evaluate

er of restroomthe Galaxy

ure 3. Different t

4. Five measurns from the start

e device arounwhile recordind the phone su

ne were not

Recording forribed, the applsweep and conat the full IR emove extraneording and therom the start ofng. To identifystart of the swcted 10 differ

weep has stoppduration (D = 7474 recordin

0-fold cross vts are shown iurements indiweep has stoppeuse for feature with longer duherefore, in al and performeortion of the re

weep itself +

n Classifying N’s efficacy in ced how the m

ms used for traNexus datase

types of places w

rements of the of the sweep for

nd the neck. Tng the impulseuch that both tcovered or b

r Feature Extrlication beginsntinues for a fu

is capturedeous informatioe start of the f the sweep insy the optimal weep to use frent durationsed after the sw0.1,…,1.0), we

ngs by the Galavalidation within Figure 4. Acate that 0.3ed is the optimextraction. It a

urations than 0ll following e

ed feature extraecording after 0.3 s duration

New Restroomclassifying new

model convergaining (traininget (103 restro

where data was c

model using 1r the feature extr

The phone e response. the speaker blocked in

raction s recording full second. within the

on between sweep by

stead of the amount of for feature s at 0.1 s weep stops e extracted axy Nexus, h SVM as

All the five 3 seconds

mal amount also shows

0.3 seconds evaluations, action only the start of n after the

m Spaces w restroom ged as the g set size). ooms), we

collected

10 different raction.

30

ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA

gupNcnnN

I

athbwdps

ApTacbths

GDthdwoasdocNc

Wc

Fn

gradually increused for trainprocedure workN restrooms frochose the samnon-restroom IRnon-restroom aN restroom is

IRs chosen wa

as the classifierhese

by the strategy we repeated thdifferent restroperformances fsize. The final r

As the numbperformance beThe weighted Fand recall of bconverged at ~between 0.91 ahan 40 restroo

seen enough va

GeneralizabilitDifferent phonehe microphone

distances and inwe must also von different phalso collectedsamples on a Ndata collected oone collected confirm the reNexus data. Tcollected on the

We performed classifier for ea

Figure 5. Measurnumber of restroo

eased the numbning. For eachked in this manom the 103 rese percentage Rs. Thus, if the

are , , . Therefore

s: r and performe

IRs. Fourth,of choosing N

he procedure fooms each roufrom the 10 roresults are show

ber of trainiecame more stF-Measure, whboth the restro~0.93. The wand 0.93 while oms. This sugariations of rest

ty of the Approes may use dife and the spean different posalidate that our

hones. In additd additional Nexus 4, an HTon these deviceon the Galax

esults obtainedTable 1 summe four phones.

a 10-fold crosach phone’s d

rements of the moms used for trai

ber of restroomh number N, nner. First, we strooms. Seconof non-restrooe total numberthen the numb, the number ∗ . Third

ed a 10-fold cro, to remove v

N spaces for trafor 10 rounds und. Finally, wounds for eachwn in Figure 5

ing set size table and graduhich incorporatom and non-r

weighted F-Methe model wa

ggests that thetrooms at that p

oach across Pfferent hardwaaker can be plasitions from onr approach gention to the Garestroom andTC One, and aes were not as y Nexus, but d from analyzmarizes the a

ss validation usdata separately.

model’s convergining

ms (N=1,…,10the evaluatio

randomly chond, we randomom IRs from ar of restroom anber of IR in theof non-restroo

d, we used SV

oss validation ovariations causaining at random

by choosing we averaged thh N training s.

increases, thually convergetes the precisioestroom classeasure fluctuat

as trained on lee model had npoint yet.

Phones are. Furthermoraced at differe

ne another. Thuneralizes to woalaxy Nexus, wd non-restrooa Galaxy S. Thextensive as ththey helped

zing the Galaxamount of da

sing SVM as th. The results a

gence with differ

03) on

ose mly

all nd

ese om

VM

on ed m, N he set

he ed. on es, ed

ess not

re, ent us, ork we om he he to xy ata

he are

shown modelsat betw

Unfortsoftwaon difcaptureother. dataset10-foldthat thephonesas welMeasu

Effect We tesrestroorestroothe Ga

Phone Na

Galaxy NNexus 4HTC OneGalaxy S

Figure validatiorestroom

rent Figure 7other thr

in Figure 6s for the four d

ween 0.92 and 0

tunately, diffeare optimizatiofferent phoneed by each phWe applied tht and tested ond cross validate extracted MFs. Phone-indepll as phone-d

ure between 0.4

of Occupancysted the model’om and soundom (e.g. urinatialaxy Nexus, w

Table 1. Sampl

ame # rest

exus (GN) 103522020

6. Model Geneon Results on fom)

7. Model trainedree phones’ data

. It highlightsdifferent phone0.98 (weighted

erent hardwarns for the mics mean that

hone are drastiche model trainn the other thrtion results, shFCCs features pendent classifdependent clas43 and 0.63).

y & Sounds on’s robustness ad generated bing, flushing, awe collected ne

le data collected

of rooms

#of rdata sa4258 2230 600 600

eralization acrour phones separa

d on Galaxy Nexaset (R: restroom

s that the claes perform simd F-Measure).

re settings acrophone and t

the impulse cally different

ned on the Galree phones’ dahown in Figur

do not generafication does nssification (we

n the Model gainst the occuby the occup

and hand washiew data from

on the four pho

estroom amples

# of ndata s32161296523573

ss phones: 10-ately (R: Restroo

xus dataset and tm; N: non-restroom

assification milarly well

along with the speaker

responses from each

laxy Nexus ataset. The e 7, reveal

alize across not perform eighted F-

upancy of a pants in a ing). Using a restroom

nes

on-restroom samples

-fold Cross-om; N: non-

tested on the m).

31

SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES

wfoo53eanlaoass

EWmmc0u

FethrthWu~awn

Wc8alrimdh

Fle

with two urinafunctional spotoccupancy ratoccupied by p57%, 71%, 86%30 additional IRearlier in the paall seven casenormal mannerarge Galaxy N

of Table 1) to tand 86% occusample each (asamples for the

Effect of SweeWe evaluated model to explominimized by collected 30 sa0.75 of the musing the Galax

For comparisonenvironment sohe IR for the sw

recording betwhe sweep (see

We note that ausing this app~0.15s in lengthat other volumwave sweep, wnormally used t

We performedcollected at eac8 demonstratesafter an outputevels perform

recordings witmprovement in

data captured higher volume

Figure 8. Measuevels. Volume 0

als, two toilet s in total) for te as the pereople. Therefo% and 100% oR samples usinaper for each os, people sim

rs. We used theNexus data corptest the new coupancy rates, accuracy: 97% remaining fou

ep Volume on the influence

ore if the obtruoutputting the

amples at 4 vomax volume) fxy Nexus.

n, we also extound without ouweep at volumeen the start oFigure 2) as ea

a limitation andproach exists bh each, in comes. In this inst

we did not needto capture the I

d a 10-fold cch volume leves that models t sweep outputmed better ththout a sweepn accuracy. Awhen the apps generally pe

urements of the0 uses only the p

stalls, and threthe evaluation.rcentage of fore, we had 1occupancy rateng the same m

of the seven occmulated using e SVM classifipus (describedllected data. Fothe model m

%), and correctur occupancy ra

the Model of the sweep

usiveness of thsweep at low

olume levels (0from 13 addit

tracted MFCCsutputting a swe

me 0. We used tf the recording

ach space’s envd potential threbecause the e

mparison to the tance, because d to include anIR after the swe

cross validatioel. The results trained on re

tted at any of han the mod

p; there was additionally, mplication outpuerformed bette

e model using 5ortion without a

ee basins (sev. We defined thfunctional spo4%, 29%, 43%es. We collect

method describcupancy rates. the restroom

ier trained on thd in the first roor the 14%, 29

misclassified ontly classified aates.

volume on thhe sweep can b

wer volumes. W0.1, 0.2, 0.5 antional restroom

s using only theep; this acted the portion of thg and the start vironment souneat to validity extracted data

0.4 s recordinthere is no sin

n additional 0.3eep plays.

on on the dashown in Figu

ecordings of IRthe four volumdel trained oa 10% or moodels trained outted sweeps er than those

5 different volusweep.

en he ots %, ed ed In in he

ow 9% ne all

he be

We nd ms

he as he of

nd. in is

ngs ne

3 s

ata ure Rs me on

ore on at at

lower after loare stilvolumethe swactive p

ContinIn thiscollectworksGalaxycollectsound automa

We codays. Wa resuenabledThe teapplypredictthe Feaclassifirestroo

ClassifWe evspaces differedays’ wall the groupecombinrespect

ume

Table 2Day

1 2345

Figure 9of days’

volumes. Althower volumell comparable e sweeps. This

weep at a loweprobing is less

nuous In-the-Ws section, we ption to evaluatefor realistic sc

y Nexus arountion application

at 0.3 of atically repeate

ollected data foWe collected d

ult the restroomd us to test the

emporal continthe temporal tions later. Weature Extractio

fier. Table 2 sumom data and 15

fying Spaces valuated the mo

in-the-wild.ent days, the mworth of data. possible comb

ed together. Wnations for 1,tively. We use

2. Sample data co# of re

943411

9. Measurements data.

hough models sweeps lose sin performan

s suggested thaer volume in s perceivable an

Wild Samplingperformed cone how well ou

cenarios, such nd the neck (n plays a 0.1 the max vo

ed this procedu

or about 2 houdata in differenms were all d

e model’s abilitnuity of each d

optimization e followed theon section andmmarizes the c

503 restroom da

odel’s ability tBecause we

model could be For each numb

binations of difWe had 5, 10, 2, 3, 4 and ed a 10-fold cr

ollected in-the-westrooms #of r

data s378 171 141 109 704

s of the model t

trained on IRsome performance to those aat it is possiblereal practice snd thus less ob

g and Evaluatintinuous in-theur classificationas when the ussee Figure 1)second sine w

olume. The aure in 5 second

urs per day on nt places each different. Thisty to classify nday’s data alloto the SVM

e procedure ded again used Scollected data ata).

to classify newcollected da

trained on 1, 2ber of days, wefferent days tha0, 10, 5 and

5 days’ worross validation

wild using Galaxrestroom samples

# of ndata s7884853904132093

trained on differ

Rs captured ances, they fter higher e to output so that the btrusive.

on e-wild data n approach ser wears a . The data

wave sweep application

ds interval.

5 different day and as s approach new spaces. owed us to classifier’s escribed in VM as the (4169 non-

w restroom ata over 5 2, 3, 4, or 5 e evaluated at could be 1 possible

rth of data method to

xy Nexus. non-restroom samples

rent number

32

ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA

etooliacg9b

ImIthththiswpth“p

Sebinina

Fd

evaluate each co accurately cl

ones, taking intikely to revis

averaged the combinations fget the mean a9, illustrate thabalanced precis

mproving PredIn this section, he model’s pehe-wild data thhem. The firssolated predic

second type ofwhen the userprediction doeshird type of

“sporadic” andperiod of time.

Spark and spoeliminated to because it can bn or out of thenterval). Howe

a smoothing alg

Figure 10. Evaluadifferent window

combination. Tlassify previouto consideratiosit some plac

10-fold crosfor a given numaverage valuesat our model sion and recall

diction Errors we describe th

erformance in hat we collectst type of errotion of one cl

f error is a “bor enters or ls not reflect th

error is whed multiple wro

oradic predictisome extent

be assumed thae restroom for ever, boundarygorithm.

ation results of tw size (top: alg. 1

This tested theusly seen spaceon the fact thates and restross validation mber of days’ . The results, scan predict n(weighted F-M

he prediction eclassifying the

ted and discussor is a “sparklass instead ofoundary” errorleaves the reshat transition imen the SVM ng results are

ion errors potby a smoot

at people normonly 5 second

y errors cannot

the two smoothin1; bottom: alg. 2)

e model’s abilies as well as net a person wouoms. Then, wresults of a

worth of data shown in Figu

new spaces wiMeasure > 0.92

errors that affee continuous is how to correk,” which is f the other. Th, which happestroom, but thmmediately. Th

predictions areturned over

tentially can bthing algorithm

mally do not jumds (the samplinbe addressed b

ng algorithms w)

ity ew uld we all to

ure ith

2).

ect in-ect an he ns he he

are r a

be m, mp ng by

Smooththe cura diffealgorithfurtherbuffermatchcorrectalso chnew onmatch predictthe cur

Smooththe samfrom rdifferetime iapproxThus, transitinumbeN) that

We tesizes ftemporleave oone ofmodel remainalgorithaverageach wFigure algorithcompathat th“see” while “performincreasincreaswindowreason boundamajoritComparecall o

CONCIn thismethodacoustimicropdemonoutputt

with

thing Algorithmrrent space typerent space typhm will hold tr predictions fr

N. If the majthis different

t all buffered phange the currene. If the maj

the current tions will be crrent space typ

thing Algorithmme as the first restroom (non-ently. People tyin a restroom

ximately only to minimize

ions into a reser of restroom pt must be in the

ested the two from 3 to 30ral continuity one day’s dataf the five com

using four daning day’s dathms to the

ged the evaluatwindow size. T

10. Windowhm was used

ared to the onehe cross-validaa portion of e“leaving one dmance will incse the variatioses, the overalw sizes, howev

is that largary errors durty voting straared to the alof restroom bu

LUSION s paper, we dd of detectingics of an enviphone on nstrate that IRsted improve t

m I. The first pe predicted bype is predictedthis new predirom the model jority of the pt space type, predictions as ent space type jority of the bspace type,

classified as the remains unch

m II. Smoothione, except th-restroom) to ypically spendm (people h5 minutes in potential mis

stroom space, predictions (> e buffer windo

smoothing al0. Smoothing in the test dat

a out strategy mbinations, weays’ worth of

ata. Then, we SVM predicti

tion results of tThe performancw size zero

d. The differene shown in Figation strategy each day’s da

day’s data out” crease when trons of restroomll performancever, hurt the peer window siring room typategy used inlgorithm I, algut sacrificed the

escribed an ing restrooms bironment with mobile phons captured aftethe accuracy

algorithm keey the SVM mod by the modeiction and keepfor a pre-set wpredictions in then our algothe new spacthat the user i

buffered predithen all the

he current spachanged.

ing algorithm Ihat it treats the non-restroom

d a small amouhave reported

public restrosclassificationsalgorithm II r1/3 * N instea

ow to correct an

lgorithms usinalgorithms r

ta. Therefore, for evaluation

e first trained f data, and tes

applied our ion results. Fthe five combice results are r

means no nce in the pegure 9 is due allows the cl

ata during traindoes not. We

rained with moms. As the wi improves at ferformance. Onizes might cape transition dn smoothing agorithm II impe precision of r

nfrastructure-inby actively prthe built in sp

nes. Our eer a sine waveof prediction

ps track of odel. When el, then the p receiving

window size the buffer

orithm will e type and is in to this ictions still e buffered ce type and

II is almost transitions (restroom)

unt of their spending

oms [12]). of actual

reduces the ad of > ½ * n error.

ng window require the we used a

n. For each the SVM

sted on the smoothing

Finally we inations for reported in smoothing

erformance to the fact lassifier to ning phase expect that ore days to indow size first. Large ne possible ause more due to the algorithms. proves the restroom.

ndependent robing the peaker and evaluations e sweep is compared

33

SESSION: CONTEXTUAL AWARENESS ON MOBILE DEVICES

against only using the environment sound without a sweep. Models can be developed on different phones to classify new restrooms with a weighted F-Measure of 0.92~0.98. Occupancy, the presence of sounds, and the volume levels of the sweep do not affect the model’s performance in significant ways. We discuss three types of errors that affect the prediction model and propose temporal smoothing algorithms to improve the prediction accuracy.

REFERENCES 1. Agnihotri, S., Rovet, J., Cameron, D., Rasmussen, C.,

Ryan, J. and Keightley, M. SenseCam as an everyday memory rehabilitation tool for youth with fetal alcohol spectrum disorder. In Proc. SenseCam 2013, ACM (2013), 86-87.

2. Beritelli, F. and Grasso, R. A Pattern Recognition System for Environmental Sound Classification based on MFCCs and Neural Networks. In Proc. ICSPCS 2008, IEEE (2008), 1-4.

3. Byrne, D., Doherty, A., Jones, G., F., Smeaton, A. Kumpulainen, S. and Järvelin. K. 2008. The SenseCam as a tool for task observation. In Proc. BCS-HCI '08, Vol. 2, 2008, 19-22.

4. Caprani, N., O'Connor, N., Gurrin, C. Experiencing SenseCam: a case study interview exploring seven years living with a wearable camera. In Proc. SenseCam 2013, ACM (2013), 52-59.

5. Chang, C., Lin, C. LibSvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2(3), 2011, 188-205.

6. Farina, A. Simultaneous Measurement of Impulse Response and Distortion with a Swept-Sine Technique. 108th Convention of the Audio Engineering Society, Paris, France. February 2000, 19-22.

7. Hempstalk, K., Frank, E., Witten, I. One-Class Classification by Combining Density and Class Probability Estimation. In Proc. ECML PKDD 2008, 505-519.

8. Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K. SenseCam: A Retrospective Memory Aid. In Proc. Ubicomp 2006, ACM (2006), 177-193.

9. Keadle, S., Lyden, J., Hickey, A., Ray, E., Fowke, J., Freedson, P., Mattews, C. Validation of a previous day recall for measuring the location and purpose of active and sedentary behaviors compared to direct observation. Int J Behav Nutr Phys Act, 2014, 11:12.

10. Kerr, J., Marshall, S., Godbole, S., Chen, J., Legge, A., Doherty, A., Kelly, P., Oliver, M., Badland, H., Foster, C. Using the SenseCam to improve classifıcations of sedentary behavior in free-living settings. American Journal of Preventive Medicine, 44(3), 2013, 290–296.

11. Kelly, P., Marshall, S., Badland, H., Kerr, J., Oliver, M., Doherty, A., Foster, C. An ethical framework for automated, wearable cameras in health behavior

research. American Journal of Preventive Medicine, 44(3), 2013, 314-319.

12. Kientz, J.A., Choe, E.K., Truong, K.N., Texting from the Toilet: Mobile Computing Use and Acceptance in Private and Public Restrooms. Knowledge Media Design Institute, University of Toronto. Technical Report KMD-13-1. April 2013.

13. Kunze, K., Lukowicz, P. Symbolic object localization through active sampling of acceleration and sound signatures. In Proc. Ubicomp’07, ACM (2007),163-180.

14. Kuttruff, H. Room acoustics. CRC Press, 2009, 160-292.

15. Marcu, G., Dey, A., Kiesler, S. Parent-driven use of wearable cameras for autism support: a field study with families. In Proc. Ubicomp’12. ACM (2012), 401-410.

16. Peltomen, V., Tuomi, J., Klapuri, A., Huopaniemi, J. Sorsa, T. Computational auditory scene recognition. In Proc. ICASSP 2001, IEEE (2001), 1941-1944.

17. Perina, A., Jojic, N. In the sight of my wearable camera: Classifying my visual experience. Eprint arXiv: 1304. 7236, 04/2013.

18. Piccardi, L.; Noris, B.; Barbey, O.; Billard, A.; Schiavone, G.; Keller, F.; von Hofsten, C., "WearCam: A head mounted wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children,". In Proc. RO-MAN 2007, IEEE (2007), 594-598.

19. Pirsiavash, H. and Ramanan, D. Detecting activities of daily living in first-person camera views. In Proc. CVPR 2012, IEEE (2012), 2847-2854.

20. Quattoni, A. and Torralba, A. Recognizing indoor scenes. In Proc. CVPR’09, IEEE (2009), 413–420.

21. Rossi, M.; Feese, S.; Amft, O.; Braune, N.; Martis, S.; Tröster, G. AmbientSense: A real-time ambient sound recognition system for smartphones. In Proc. PerCom Workshop, IEEE (2013), 230-235.

22. Rossi, M., Seiter, J., Amft, O., Buchmeier, S. and Tröster, G. RoomSense: an indoor positioning system for smartphones using active sound probing. In Proc. AH 2013, ACM (2013), 89-95.

23. Rossing, T., Moore, R., Wheeler, P. The Science of Sound, 3rd Edition. Addison-Wesley. 2001.

24. Stan, G., Embrechts, J., and Archambeau, D. Comparison of different impulse response meansurement techniques. Journal of the Audio Engineering Society, 50(4), 2002, 249-262.

25. Templeman, R., Korayem, M., Crandall, D., Kapadia, A. PlaceAvoider: Steering First-Person Cameras away from Sensitive Spaces. In Proc. NDSS 2014.

26. Tian, Y., Yang, X., Yi, C. and Arditi, A. Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Machine Vision and Applications, 24(3), 2013, 521-535.

34

ISWC '14, SEPTEMBER 13 - 17, 2014, SEATTLE, WA, USA