VibLive: A Continuous Liveness Detection for Secure Voice ...

Linghan Zhang, Sheng Tan, Zi Wang, Yili Ren, Zhi Wang, Jie Yang

Dept. of CS, Florida State University, USA

Presenter: Linghan Zhang

ACSAC 2020December 7-11, 2020 · Online

VibLive: A Continuous Liveness Detection for Secure Voice User Interface

In IoT Environment

❑ The Smart Home

✓ Security Control▪ Smart locks

▪ Smart alarms

✓ Appliance control▪ Smart kitchen

▪ Smart vacuums

▪ Smart plugs

✓ Personal Business Control

▪ E-commerce

▪ Daily Schedule

VUI in the IoT Environment

2


❑ The Smart Office.

✓ Access control▪ Access to locations

▪ Access to devices

✓ Environment control▪ Temperature control

▪ Lighting control

✓ Teamwork scheduling

3

❑ The Smart Vehicle

✓ In-car voice assistant▪ Navigation

▪ Making phone calls

▪ Playing music

✓ Hands free Driving


4

Attacks on VUI❑ VUI devices are vulnerable to replay attacks.

✓ Pre-recorded, concatenated, synthesized voices

✓ Easy access

✓ Highly effective

Bank accountPasswordsPrivate conversation……....................

5

Attacks on VUI❑ Successful attacks on VUI could cause severe consequences. ✓ Credential breaching

✓ Privacy leakage

✓ Burglary

✓ Vehicle misleading

Bank accountPasswordsPrivate conversation……....................

6

Previous Work vs. Our Solution

❑ Only support wake words or registered passwords authentication

❖ Continuity

VibLive secures the whole communication session.

❖ Transparency

VibLive requires no additional operations or added hardware.

❖ Applicability

VibLive looses the constraints of distances and locations.

❑ Require extra devices or actions

❑ Necessitate close distances and fixed locations

7

System and Attack Model

Voice recording

Activation?

Google Home Service

Speaker-dependent speech recognition

Speaker-independent speech

recognition

CommandsY

N

Attack at the speech recognition phase

❑ Typical VUI capable devices’ workflow

✓ Activated with the “right” wake words✓ Execute any commands after being activated

✓ Activated by authenticating the user speaking the wake words✓ Only execute the authenticated user’s commands

❑ Replay Attacks

✓ Activation✓ Speech Recognition

Attack at the activation phase

8

VibLive: Basic Idea

❑ Human bone-conducted vibrations and air-conducted voices are different.

❑ Loudspeakers’ rigid-body vibrations and replayed voices are always the same.

❑ Bone-conducted vibrations and air-conducted voices always coexist when a live human speech.

9

VibLive: Basic Idea❑ VUI measures bone-conducted vibrations

✓ Speaker of VUI emits probe signal

✓ Bone-conducted vibrations modulate the probe signal

✓ Mic of VUI record reflections and voices.

10

Air Conduction vs. Bone Conduction

Vibration

Vocal Folds

Resonance FormantsVocaltract

ArticulatorsAir-conductedAudible Voices

Bone-conductedvibrationsSoft Tissue

Vibration

Bone Vibration

Soft Tissueand Skin

Skullbone

Soft Tissue Vibration

Air Conduction

Bone Conduction

Vibration +

11

Air-conducted voices vs. Bone-conducted vibrations Fr

eq

ue

ncy

(Hz)

Weaken

Attenuate

Phoneme /s/

Weaken

(a) Normal Microphone Recorded Air-conducted

Phoneme

(b) Contact Microphone Recorded

Bone-conducted Phoneme

(c) VibLive SensedVibrations of Phoneme

Fre

qu

en

cy(H

z)

Phoneme/b/

Weakened

New Added

Enhanced

Fre

qu

en

cy(H

z)

Enhanced

Phoneme /m/

❑ Different power distributions for unvoiced consonants like /s/.

❑ Some formants vary for plosive sounds like /b/.

❑ Low frequency bands are enhanced for nasal sounds like /m/.

12

VibLive: Approach Overview

Live user confirmed. Executing commands

1. Activated with the wake words.

2. Built-in speaker emits probe signal

3. Built-in mic records both probe signal reflections and voices.

4. Separate voices and recover vibrations.

5. Extract features from both voices and vibrations for comparison.

13

VibLive: System Flow

DetectionVUI

Activation

Signal Denoising

Signal Segmentation

Feature Extraction

Similarity Comparison

Signal Processing

Air-Conducted Voice Recording

Bone-conducted vibration Sensing

Bone-Conducted vibration decoding

Signal Collection

Bone-conducted Vibrations

14

Bone-conducted Vibrations SensingProbe Signal

Vibration Signal

Received Signal

❑ In a short period of time ∆t, the head vibrates at frequency 𝒇𝟎, displacement Δs is:

❑ Meanwhile, the probe signal’s power change isinversely proportional to transmission distance Δs.

15

Bone-conducted Vibrations Decoding

N N N

DFT

𝑆𝑃𝐿𝑓𝑝

❑ We calculate the SPL of every N points of the M samples long signal.

❑ We concatenate the SPLs of all 𝑴

𝑵segments.

·······[𝑆𝑃𝐿𝑓𝑝,1 𝑆𝑃𝐿𝑓𝑝,2 𝑆𝑃𝐿𝑓𝑝,

𝑀𝑁]

·······

16

Feature Extraction

❑ We examine the LinearPrediction(LP) Spectrumfor feature extraction.

❑ We choose 15th order LPSpectrum to differentiateair-conducted voices andrecovered vibrations.

17

Similarity Comparison❑ We extract peaks of the 15th order LP spectrums for similarity comparison.

❑ The peak numbers of the voice and the vibration can be different.

❑ We match the peaks that are close in the frequency axis.

❑ We pad 0 at a frequency point if there is no match.

18

Experimental Evaluation: Data Collection

❑ Three types of phones with different sizes and audio chipsets.

❑ One Google Home microphone grade external mic.

❑ Both short-range and long-range experiment.

❑ 25 participants, 2500 positive cases.

❑ Three types loudspeakers, 2500 replay attacks.

❑ Three types of targeted attacks: occlusion, low SPL, and large angle.

19

Experimental Evaluation: Overall Performance

20

Experimental Evaluation: Attacks

21

Conclusion

❑ We design VibLive for secure VUI in IoT environments in both short-range andlong-range applications.

❑ We develop an approach to allow the VUI capable devices to sense the bone-conducted vibrations without extra hardware.

❑ VibLive supports text-independent liveness detection, thus could protect thewhole communication session.

❑ Experimental results show that VibLive is effective under various setups.

22

VibLive: A Continuous Liveness Detection for Secure Voice ...

Documents

Transcript of VibLive: A Continuous Liveness Detection for Secure Voice ...