Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

Fu Song ([email protected])

Speaker Recognition Systems (SRSs)


Bob ImposterAlice


Bob ImposterAlice

Ubiquitous Application


Bob ImposterAlice


Voice assistant wake up


Bob ImposterAlice


Voice assistant wake up Personalized service on smart home


Bob ImposterAlice



Financial transaction


Bob ImposterAlice



Financial transaction App log in

WeChat




WeChat

Safety-critical scenario




WeChat

Safety-critical scenarioOnce broken

Safety-critical scenarioOnce broken

property damage

reputation degrade

sensitive information leak…




WeChat

Security of SRSs!!!




WeChat

Mainstream implementation of SRSs


Machine Learning (ML)



However,



However, ML is vulnerable to adversarial examples




Ian Goodfellow et al.




Ian Goodfellow et al. Nicholas Carlini et al.

Is adversarial attack practical on SRSs ?

Is adversarial attack practical on SRSs ?

FAKEBOBBlack-boxAppliable to general SRS taskEffective on commercial SRSsEffective in over-the-air attack

Threat model

Threat modelAttacker Goal: pass voice authentication; gain access to privilege


Attacker Capability: no information about model structure / parameter;


Attacker Capability: no information about model structure / parameter; limited to query the speak model of the victims

Overview of FAKEBOB

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design.

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeeds

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism

1 2

3 4

Overview of FAKEBOB


Open-set identification (OSI) task𝜃𝜃: threshold

1 2

3 4

Overview of FAKEBOB


Open-set identification (OSI) task𝜃𝜃: threshold

Tailored for different SRSs tasks: CSI, SV, OSI

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attacker

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm

�̂�𝜃 ≈ 𝜃𝜃

1 2

3 4

Overview of FAKEBOB

3 NES-based gradient estimation

1 2

3 4

Overview of FAKEBOB


rely on scores and decisions by querying victim speaker model

1 2

3 4

Overview of FAKEBOB


rely on scores and decisions by querying victim speaker model Black-box

1 2

3 4

Overview of FAKEBOB


4 Solve the optimization problem by gradient descent

gradient information

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attacknoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB


previous work: noise modelnoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB


somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB


somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective

ours: improve confidence

1 2

3 4

Experimental result

Attack Open-source

Experimental result

Attack Open-source

≈ 100% attack success rate (ASR)

Experimental result

Attack Open-source


Attack Commercial

Experimental result

Attack Open-source


Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial

Experimental result

Attack Open-source


Microsoft Azure: 26% ASR

Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial

Experimental result

Over the air Attack

Experimental result

Over the air Attack

Experimental result

different distance between loundspeakerand microphone

Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10

Over the air Attack

Experimental result

Different devices (at least 70% ASR)

JBL portable speaker

Shinco broadcast equipment

Loundspeaker:

Laptop

iPhone OPPO

Microphone:

different distance between loundspeakerand microphone


Device independent

Over the air Attack

Experimental result

JBL portable speaker

Shinco broadcast equipment

Loundspeaker:

Laptop

different acoustic environments White / Bus / Restaurant / Music noiseat least 48% ASR when noise < 60 dB

Different devices (at least 70% ASR)different distance between loundspeakerand microphone


iPhone OPPO

Microphone:

Device independentEnvironment independent

Imperceptibility

Imperceptibility

originalvoiceAlice

(source speaker)

utter

Imperceptibility

+

perturbation

originalvoice

adversarialvoice

craft

utter

Alice(source speaker)

Imperceptibility

+ “It is uttered by Bob”

Bob’s speaker model

perturbation

originalvoice

adversarialvoice

craft

utterrecognized (target)


Imperceptibility


“It is uttered by Alice”


perturbation

originalvoice

adversarialvoice

craft

utterrecognized

listened

The third person

(target)

(source)


Imperceptibility


“It is uttered by Alice”


perturbation

originalvoice

adversarialvoice

craft

utterrecognized

listened

The third person

imperceptibility in SRSs

(target)

(source)


Imperceptibility

quantitative analysis of imperceptibility

Imperceptibility

Q: How many people think adversarial and original voices are uttered by the same speaker ?


Imperceptibility


A: Human Study on Amazon MTurk


Imperceptibility



API attack: 64.9% same


Imperceptibility

Over-the-air attack: 34.0% same

API attack: 64.9% same




Take away:1. Black-box and practical adversarial attack against

speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing

FAKEBOB Website: https://sites.google.com/view/fakebob/home

FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB

https://sites.google.com/view/fakebob/home

https://github.com/FAKEBOB-adversarial-attack/FAKEBOB

Icon made by Freepik from www.flaticon.com

Icon made by xnimrodx from www.flaticon.com

Icon made by Eucalyp from www.flaticon.com

Icon made by Becris from www.flaticon.com

https://www.flaticon.com/authors/freepik

http://www.flaticon.com/

https://www.flaticon.com/authors/xnimrodx


https://www.flaticon.com/authors/eucalyp


https://www.flaticon.com/authors/becris


Take away:1. Black-box and practical adversarial attack against

speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing

FAKEBOB Website: https://sites.google.com/view/fakebob/home

FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB

https://sites.google.com/view/fakebob/home

https://github.com/FAKEBOB-adversarial-attack/FAKEBOB

Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...

Documents

Transcript of Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...