VibLive: A Continuous Liveness Detection for Secure Voice ...
Transcript of VibLive: A Continuous Liveness Detection for Secure Voice ...
Linghan Zhang, Sheng Tan, Zi Wang, Yili Ren, Zhi Wang, Jie Yang
Dept. of CS, Florida State University, USA
Presenter: Linghan Zhang
ACSAC 2020December 7-11, 2020 · Online
VibLive: A Continuous Liveness Detection for Secure Voice User Interface
In IoT Environment
❑ The Smart Home
✓ Security Control▪ Smart locks
▪ Smart alarms
✓ Appliance control▪ Smart kitchen
▪ Smart vacuums
▪ Smart plugs
✓ Personal Business Control
▪ E-commerce
▪ Daily Schedule
VUI in the IoT Environment
2
VUI in the IoT Environment
❑ The Smart Office.
✓ Access control▪ Access to locations
▪ Access to devices
✓ Environment control▪ Temperature control
▪ Lighting control
✓ Teamwork scheduling
3
❑ The Smart Vehicle
✓ In-car voice assistant▪ Navigation
▪ Making phone calls
▪ Playing music
✓ Hands free Driving
VUI in the IoT Environment
4
Attacks on VUI❑ VUI devices are vulnerable to replay attacks.
✓ Pre-recorded, concatenated, synthesized voices
✓ Easy access
✓ Highly effective
Bank accountPasswordsPrivate conversation……....................
5
Attacks on VUI❑ Successful attacks on VUI could cause severe consequences. ✓ Credential breaching
✓ Privacy leakage
✓ Burglary
✓ Vehicle misleading
Bank accountPasswordsPrivate conversation……....................
6
Previous Work vs. Our Solution
❑ Only support wake words or registered passwords authentication
❖ Continuity
VibLive secures the whole communication session.
❖ Transparency
VibLive requires no additional operations or added hardware.
❖ Applicability
VibLive looses the constraints of distances and locations.
❑ Require extra devices or actions
❑ Necessitate close distances and fixed locations
7
System and Attack Model
Voice recording
Activation?
Google Home Service
Speaker-dependent speech recognition
Speaker-independent speech
recognition
CommandsY
N
Attack at the speech recognition phase
❑ Typical VUI capable devices’ workflow
✓ Activated with the “right” wake words✓ Execute any commands after being activated
✓ Activated by authenticating the user speaking the wake words✓ Only execute the authenticated user’s commands
❑ Replay Attacks
✓ Activation✓ Speech Recognition
Attack at the activation phase
8
VibLive: Basic Idea
❑ Human bone-conducted vibrations and air-conducted voices are different.
❑ Loudspeakers’ rigid-body vibrations and replayed voices are always the same.
❑ Bone-conducted vibrations and air-conducted voices always coexist when a live human speech.
9
VibLive: Basic Idea❑ VUI measures bone-conducted vibrations
✓ Speaker of VUI emits probe signal
✓ Bone-conducted vibrations modulate the probe signal
✓ Mic of VUI record reflections and voices.
10
Air Conduction vs. Bone Conduction
Vibration
Vocal Folds
Resonance FormantsVocaltract
ArticulatorsAir-conductedAudible Voices
Bone-conductedvibrationsSoft Tissue
Vibration
Bone Vibration
Soft Tissueand Skin
Skullbone
Soft Tissue Vibration
Air Conduction
Bone Conduction
Vibration +
11
Air-conducted voices vs. Bone-conducted vibrations Fr
eq
ue
ncy
(Hz)
Weaken
Attenuate
Phoneme /s/
Weaken
(a) Normal Microphone Recorded Air-conducted
Phoneme
(b) Contact Microphone Recorded
Bone-conducted Phoneme
(c) VibLive SensedVibrations of Phoneme
Fre
qu
en
cy(H
z)
Phoneme/b/
Weakened
New Added
Enhanced
Fre
qu
en
cy(H
z)
Enhanced
Phoneme /m/
❑ Different power distributions for unvoiced consonants like /s/.
❑ Some formants vary for plosive sounds like /b/.
❑ Low frequency bands are enhanced for nasal sounds like /m/.
12
VibLive: Approach Overview
Live user confirmed. Executing commands
1. Activated with the wake words.
2. Built-in speaker emits probe signal
3. Built-in mic records both probe signal reflections and voices.
4. Separate voices and recover vibrations.
5. Extract features from both voices and vibrations for comparison.
13
VibLive: System Flow
DetectionVUI
Activation
Signal Denoising
Signal Segmentation
Feature Extraction
Similarity Comparison
Signal Processing
Air-Conducted Voice Recording
Bone-conducted vibration Sensing
Bone-Conducted vibration decoding
Signal Collection
Bone-conducted Vibrations
14
Bone-conducted Vibrations SensingProbe Signal
Vibration Signal
Received Signal
❑ In a short period of time ∆t, the head vibrates at frequency 𝒇𝟎, displacement Δs is:
❑ Meanwhile, the probe signal’s power change isinversely proportional to transmission distance Δs.
15
Bone-conducted Vibrations Decoding
N N N
DFT
𝑆𝑃𝐿𝑓𝑝
❑ We calculate the SPL of every N points of the M samples long signal.
❑ We concatenate the SPLs of all 𝑴
𝑵segments.
·······[𝑆𝑃𝐿𝑓𝑝,1 𝑆𝑃𝐿𝑓𝑝,2 𝑆𝑃𝐿𝑓𝑝,
𝑀𝑁]
·······
16
Feature Extraction
❑ We examine the LinearPrediction(LP) Spectrumfor feature extraction.
❑ We choose 15th order LPSpectrum to differentiateair-conducted voices andrecovered vibrations.
17
Similarity Comparison❑ We extract peaks of the 15th order LP spectrums for similarity comparison.
❑ The peak numbers of the voice and the vibration can be different.
❑ We match the peaks that are close in the frequency axis.
❑ We pad 0 at a frequency point if there is no match.
18
Experimental Evaluation: Data Collection
❑ Three types of phones with different sizes and audio chipsets.
❑ One Google Home microphone grade external mic.
❑ Both short-range and long-range experiment.
❑ 25 participants, 2500 positive cases.
❑ Three types loudspeakers, 2500 replay attacks.
❑ Three types of targeted attacks: occlusion, low SPL, and large angle.
19
Experimental Evaluation: Overall Performance
20
Experimental Evaluation: Attacks
21
Conclusion
❑ We design VibLive for secure VUI in IoT environments in both short-range andlong-range applications.
❑ We develop an approach to allow the VUI capable devices to sense the bone-conducted vibrations without extra hardware.
❑ VibLive supports text-independent liveness detection, thus could protect thewhole communication session.
❑ Experimental results show that VibLive is effective under various setups.
22
23