Task 2ciii shot-log for the ict creative media digital video
Privacy Protection for Life-log Video
description
Transcript of Privacy Protection for Life-log Video
![Page 1: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/1.jpg)
Privacy Protection for Life-log Video
Jayashri Chaudhari
November 27, 2007
Department of Electrical and Computer EngineeringUniversity of Kentucky, Lexington, KY 40507
![Page 2: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/2.jpg)
Outline
Motivation and Background Proposed Life-Log System Privacy Protection Methodology
Face Detection and Blocking Voice Segmentation and Distortion
Experimental Results Segmentation Algorithm Analysis Audio Distortion Analysis
Conclusions
![Page 3: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/3.jpg)
What is a Life-Log System?
Applications include• Law enforcement
• Police Questioning
• Tourism
• Medical Questioning
• Journalism
“A system that records everything, at every moment and everywhere you go”
Existing Systems/work
1) “MyLifeBits Project”: At Microsoft Research
2) “WearCam” Project: At University of Toronto, Steve Mann
3) “Cylon Systems”: http::/cylonsystems.com at UK (a portable body worn surveillance system)
![Page 4: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/4.jpg)
Technical Challenges
Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface
![Page 5: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/5.jpg)
Technical Challenges
Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface
![Page 6: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/6.jpg)
Why Privacy Protection?
Privacy is fundamental right of every citizen Emerging technologies threaten privacy right There are no clear and uniform rules and
regulations regarding video recording People are resistant toward technologies like
life-log Without tackling these issues the deployment of
such emerging technologies is impossible
![Page 7: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/7.jpg)
Research Contributions
Practical audio-visual privacy protection scheme for life-log systems
Performance measurement (audio) onPrivacy protectionUsability
![Page 8: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/8.jpg)
Proposed Life-log System
“A system that protects the audiovisual privacy of the persons captured by a portable video recording device”
![Page 9: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/9.jpg)
Privacy Protection Scheme
Design Objectives
• Privacy• Hide the identity of the subjects being captured
• Privacy verses usefulness: • Recording should convey sufficient information to be useful
√ Usefulness× Privacy
× Usefulness√ Privacy
√ Usefulness√ Privacy
![Page 10: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/10.jpg)
Design Objectives Anonymity or Ambiguity
• The scheme should generate ambiguous identity of the recorded subjects.
• Every individual will look and sound identical• Reduce correlation attacks
Speed• Protection scheme should work in real time
Interview Scenario• Producer is speaking with a single subject in relative quiet
room
![Page 11: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/11.jpg)
Privacy Protection Scheme Overview
audio
Audio Segmentation
Audio Segmentation
Audio Distortion
Audio Distortion
Face Detection and
Blocking
Face Detection and
Blocking
videoSynchronization & Multiplexing
Synchronization & Multiplexing
storage
S
P
S: Subject (The person who is being recorded)
P: Producer (The person who is the user of the system)
![Page 12: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/12.jpg)
Voice Segmentation and distortion
Statek=Statek-1 or Subject or Producer
Windowed
Power, Pk
Computation
Windowed
Power, Pk
ComputationPk <TSPk <TS Pk <TP
Pk <TP
Y Y
Statek= Producer
Statek= Subject
Storage
Pitch Shifting
We use the PitchSOLA time-domain pitch shifting method.
* “DAFX: Digital Audio Effects” by Udo Zölzer et al.
![Page 13: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/13.jpg)
Pitch Shifting Algorithm
Pitch Shifting (Synchronous Overlap and Add):
Steps 1) Time Stretching by a factor of α using window of size N and stepsize Sa
Input Audio
N
X1(n)
SaX2(n)
α*Sa
Step 2) Re-sampling by a factor of 1/α to change pitch
X2(n) X2(n)Km
Max correlationReduce discontinuity in phase and pitchMixing
![Page 14: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/14.jpg)
Face Detection and Blockingcamera
FaceDetection
FaceDetection
Face detection is based on Viola & Jones 2001.
FaceTracking
FaceTracking
SubjectSelection
SubjectSelection
SelectiveBlocking
SelectiveBlocking
Audio segmentationresults
Subjecttalking
Producertalking
![Page 15: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/15.jpg)
Initial Experiments1
• Analysis of Segmentation algorithm
• Analysis of Audio distortion algorithm
1) Accuracy in hiding identity
2) Usability after distortion
1: Chaudhari J., S.-C. Cheung, and M. V. Venkatesh. Privacy protection for life-log video. In IEEE Signal Processing Society SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics, 2007.
![Page 16: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/16.jpg)
Segmentation ExperimentExperimental Data:
• Interview Scenario in quiet meeting room
• Three interviews recording of about 1 minute and 30 seconds long
Transitions
P S P S P PS Silence
S: Subject Speaking
P: Producer Speaking
![Page 17: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/17.jpg)
Segmentation Results
Meeting# Transition#
(Ground truth)
Correctly identified transitions#
Falsely detected
Transitions#
Precision Recall
1 7 6 10 0.375 0.857
2 7 7 5 0.583 1
3 6 6 10 0.353 1
truthgroundin stransition#
ns transitioidentifiedcorrectly #Recall
ns transitioidentified #
ns transitioidentifiedcorrectly # Precision
![Page 18: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/18.jpg)
Comparison With CMU Segmentation Algorithm
Meeting # Our Algorithm CMU Algorithm
Precision Recall Precision Recall
1 0.375 0.857 0.667 0.57
2 0.583 1 1 0.57
3 0.353 1 0.4 0.5
CMU audio segmentation algorithm1 used as benchmark
1:Matthew A. Seigler, Uday Jain, Bhiksha Raj, and Richard M. Stern. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, New York, 1997.
![Page 19: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/19.jpg)
Speaker Identification Experiment
Experimental Data
• 11 Test subjects, 2 voice samples from each subject
• One voice sample is used as training and the other is used for testing
• Public domain speaker recognition software
Script1This script is used for training the speaker recognition software
Train
TestScript2This script is used to test the performance of audio distortion in hiding the identity
![Page 20: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/20.jpg)
Speaker Identification Results
Person ID
Without Distortion
(Person ID identified)
Distortion 1
(Person ID identified)
Distortion 2
(Person ID identified)
Distortion 3
(Person ID identified)
1 1 5 8 5
2 2 6 8 6
3 3 5 3 5
4 4 6 6 5
5 5 3 10 6
6 6 8 6 5
7 7 5 2 5
8 8 10 11 5
9 9 5 8 5
10 10 5 2 5
11 11 4 8 5
Error Rate
0% 100% 90.9% 100%
Distortion 1: (N=2048, Sa=256, α =1.5) Distortion 2: (N=2048, Sa=300, α =1.1)
Distortion 3: (N=1024, Sa=128, α =1.5)
![Page 21: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/21.jpg)
Usability Experiments
Experimental Data
• 8 subjects, 2 voice samples from each subject
• One voice sample is used without distortion and the other is distorted
• Manual transcription (5 human tester)
1.Wav (transcription1)1.Wav (transcription1)This transcription is of undistorted This transcription is of undistorted voice --- stored in one dot wav file.voice --- stored in one dot wav file.
2.Wav (transcription2)2.Wav (transcription2)This transcription is of distorted voice This transcription is of distorted voice sample --- in two dot wav ---.sample --- in two dot wav ---.
Manual Transcription
Unrecognized words
![Page 22: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/22.jpg)
Usability after distortion
Word Error Rate: Standard measure of word recognition error for speech recognition system
WER= (S+D+I) /N
S = # substitution
D = # deletion
I = # insertion
N = # words in reference sample
Tool used: NIST tool SCLITE
![Page 23: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/23.jpg)
Extended Experiments
Data set TIMIT (Texas Instruments and Massachusetts Institute of
Technology) Speech Corpora
Experimental Setup Allowable range of alpha (α): 0.2-2.0 Five alpha values (α=0.5,0.75,1,1.25,1.40) Increase the scope of experiments
• “Subjective Experiments”: Use testers to access privacy and usability
Privacy Experiments (Speaker Identification)
![Page 24: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/24.jpg)
• Total 30 audio clips in each set
• Re-divide the audio clips from each sets into five groups (1-5)
• Each group consists of 6 audio clips randomly selected from each set
• Each group was assigned to three testers and were asked to do 3 tasks
TIMIT Corpora
(630 speakers, 10 audio clips per speaker)
Our Experiments
(30 speakers, 5 audio clips per speaker)
Set A
(α=1)
Set B
(α=0.5)
Set C
(α=0.75)
Set E
(α=1.40)Set D
(α=1.25)
Experimental Setup
![Page 25: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/25.jpg)
Task 1: Transcribe audio clips in the assigned group.
Purpose: Determine usability of the recording after distortion
Results:Metric: WER for each transcription by the
testerAverage WER for each clip from 3 testers
WER for Speaker with the given alpha(α) value
Subjective Experiments
![Page 26: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/26.jpg)
(Effect of distortion on WER) Average WER for Set A,B,C,D,E
0
20
40
60
80
100
120
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Person ID(1-30)
Aver
age
WER
Per
cent
age
set A
set B
set C
set D
set E
![Page 27: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/27.jpg)
0
10
20
30
40
50
60
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
set A
set C
set D
set E
Average WER per speaker for each alpha value
(0-30)
(0-60)
(0-35)
(0-35)
![Page 28: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/28.jpg)
Average WER per Set
Avg WER for each set
0
20
40
60
80
100
120
1
Avg
WE
R
A B C D E
14.2
100
22.4 15.3 14.4
Sets
![Page 29: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/29.jpg)
Statistical Analysis Z-test calculations
Null Hypothesis: The average WER does not change (from Set A (before distortion) ) after the distortion for a given value of pitch scaling parameter (alpha)
H0: p1 = p2 (Null Hypothesis) Ha: p1 != p2
Parameters Value
Population Size 12*30=360
α 0.05
Confidence Level 95%
Z-Test critical
( |Zα/2| )
1.96
Rule for Rejection of H0
Z>=Zα/2 or
Z<=-Zα/2
Comparison Statistics
Set A and B (0.50) 46.71>=1.96
Set A and C (0.75) 2.873>=1.96
Set A and D (1.25) 0.419<=1.96
Set A and E (1.40) 0.0695<=1.96
Z-Test parameters Z-Test Results
![Page 30: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/30.jpg)
Subjective Experiments
Group Average # of distinct voices per subset
(Each subset consist of 6 audio clips)
Subset of
A
Subset of
B
Subset of
C
Subset of
D
Subset of
E
1 6.0 3.33 4.33 4.0 3.33
2 6.0 3.0 3.33 4.0 4.0
3 6.0 2.0 4.0 3.0 4.0
4 6.0 2.67 4.0 3.67 2.67
5 6.0 3.0 3.0 3.67 4.0
Average Number of Distinct voices
6.0 2.75 3.92 3.67 3.50
Task 2: Identify the number of distinct voices in each subset in the assigned group.
Purpose: Estimate ambiguity created by pitch shifting
Results:
![Page 31: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/31.jpg)
Subjective Experiments
Task 3: For each clip from subset of Set A (which is the original un-distorted speech set); identify a clip in other subsets in which the same speaker may be speaking
Purpose: Qualitatively measure the assurance of Privacy Protection achieved by distortion
Results: None of the speakers from set A was identified from other distorted sets. (100% Recognition Error Rate)
![Page 32: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/32.jpg)
Privacy Experiments
Speaker Identification Experiments
ASR tools (LIA_Spk-Det and ALIZE)1 by LIA lab at the University of Avignon Speaker Verification Tool GMM-UBM (Gaussian Mixture Model-Universal
Background Model)• Single Speaker Independent Background Model
• Decision: Likelihood Ratio:
1: Bonastre, J.-F., Wild, F., Alize: a free, open tool for speaker recognition, http://www.lia.univ-avignon.fr/heberges/ALIZE/
0
1
( | )
( | )
p Y H
p Y H
![Page 33: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/33.jpg)
LIA_RAL Speaker-Det
WarpingTrainingInitialization
World Modeling
Bayesian Adaptation (MAP)
Target Speaker Modeling
32 coefficients = 16 LFCC + 16 derivative coefficients
(SPRO4)
2 GMM (2048 components)
1: Male 2:Female
Feature Extraction
(SPRO Tool)
Silence Frame Removal
(EnergyDetector)
Parameter Normalization
(NormFeat)
Front Processing
Adapts a World Model
(TrainWorld)(TrainTarget)
Speaker Detection
(ComputeTest)
( | )( | ) log
( | )
l s TLLR s T
l s W
Feature Vectors
![Page 34: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/34.jpg)
Experimental Setup World Model
Number of male speakers = 325 Number of female speakers = 135
Target Speaker Model Number of male test clips = 20 Number of female test clips = 10
Two sets of experiments Same Model:
• World Model and Individual Speaker Models: (Training Set: distorted speech with the corresponding alpha)
Cross Model: • World Model and Individual Speaker Models: (Training Set: un-
distorted speech)
![Page 35: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/35.jpg)
Privacy Results
Alpha Sex Same Model Cross Model
Set A M 1.0 1.0
Set A F 4.4 4.4
Set B M 2.5 150.75
Set B F 1.7 57.80
Set C M 8.65 170.90
Set C F 5.4 46.40
Set D M - 185.75
Set D F 20.30 67.80
Set E M 52.05 157.45
Set E F 29.20 79.80
Conclusions
• Cross Model: Distorted speech, no matter what alpha value is used, is very different from the original speech.
• Same Model: Set B and Set C do not provide adequate protection as the rank is still very near the top.
• Numbers in table is Average rank for the true speakers of the test clips for the corresponding alpha value
![Page 36: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/36.jpg)
Example Video
![Page 37: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/37.jpg)
Conclusions
Proposed Real time implementation of voice-distortion and face blocking for privacy protection in Life-log video
Analysis of Audio Segmentation Analysis of Audio Distortion for usability Analysis of Audio Distortion for privacy protection
![Page 38: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/38.jpg)
Acknowledgment
• Prof. Samson Cheung• People at Center of Visualization and
Virtual Environment• Prof. Donohue and Prof. Zhang
Thank you!
![Page 39: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/39.jpg)
![Page 40: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/40.jpg)
Voice Distortion
Voice Identity Vocal Track (Formats) : Filters Vocal Chord (Pitch): Excitation Source
Different ways to distort audio: Random mixture
• Makes the recording useless Voice Transformation
• For example, • More Complex, not suitable for real-time applications
Pitch-shifting • Changes the pitch of voice• Keeps the recording useful
PitchSOLA time-domain pitch shifting method. * “DAFX: Digital Audio Effects” by U. Z. et al. Simple with less complexity
![Page 41: Privacy Protection for Life-log Video](https://reader038.fdocuments.net/reader038/viewer/2022102809/56814548550346895db2179b/html5/thumbnails/41.jpg)
• Cross Model:
• World Model and Individual Speaker Models: (Training Set: un-distorted speech)
• Same Model
• World Model and Individual Speaker Models: (Training Set: distorted speech)