Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...
Transcript of Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...
Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems
Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu
Fu Song ([email protected])
Speaker Recognition Systems (SRSs)
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Ubiquitous Application
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Ubiquitous Application
Voice assistant wake up
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction
Speaker Recognition Systems (SRSs)
Bob ImposterAlice
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction App log in
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction App log in
Safety-critical scenario
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction App log in
Safety-critical scenarioOnce broken
Safety-critical scenarioOnce broken
property damage
reputation degrade
sensitive information leak…
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction App log in
Security of SRSs!!!
Ubiquitous Application
Voice assistant wake up Personalized service on smart home
Financial transaction App log in
Mainstream implementation of SRSs
Mainstream implementation of SRSs
Machine Learning (ML)
Mainstream implementation of SRSs
Machine Learning (ML)
However,
Mainstream implementation of SRSs
Machine Learning (ML)
However, ML is vulnerable to adversarial examples
Mainstream implementation of SRSs
Machine Learning (ML)
However, ML is vulnerable to adversarial examples
Ian Goodfellow et al.
Mainstream implementation of SRSs
Machine Learning (ML)
However, ML is vulnerable to adversarial examples
Ian Goodfellow et al. Nicholas Carlini et al.
Is adversarial attack practical on SRSs ?
Is adversarial attack practical on SRSs ?
FAKEBOBBlack-boxAppliable to general SRS taskEffective on commercial SRSsEffective in over-the-air attack
Threat model
Threat modelAttacker Goal: pass voice authentication; gain access to privilege
Threat modelAttacker Goal: pass voice authentication; gain access to privilege
Attacker Capability: no information about model structure / parameter;
Threat modelAttacker Goal: pass voice authentication; gain access to privilege
Attacker Capability: no information about model structure / parameter; limited to query the speak model of the victims
Overview of FAKEBOB
1 2
3 4
Overview of FAKEBOB
1 Effective loss function design.
1 2
3 4
Overview of FAKEBOB
1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeeds
1 2
3 4
Overview of FAKEBOB
1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism
1 2
3 4
Overview of FAKEBOB
1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism
Open-set identification (OSI) task𝜃𝜃: threshold
1 2
3 4
Overview of FAKEBOB
1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism
Open-set identification (OSI) task𝜃𝜃: threshold
Tailored for different SRSs tasks: CSI, SV, OSI
1 2
3 4
Overview of FAKEBOB
2 Threshold: unique in VPR; unknown to attacker
1 2
3 4
Overview of FAKEBOB
2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm
1 2
3 4
Overview of FAKEBOB
2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm
1 2
3 4
Overview of FAKEBOB
2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm
�̂�𝜃 ≈ 𝜃𝜃
1 2
3 4
Overview of FAKEBOB
3 NES-based gradient estimation
1 2
3 4
Overview of FAKEBOB
3 NES-based gradient estimation
rely on scores and decisions by querying victim speaker model
1 2
3 4
Overview of FAKEBOB
3 NES-based gradient estimation
rely on scores and decisions by querying victim speaker model Black-box
1 2
3 4
Overview of FAKEBOB
3 NES-based gradient estimation
4 Solve the optimization problem by gradient descent
gradient information
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attack
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attacknoise in air makes attack ineffective
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attack
previous work: noise modelnoise in air makes attack ineffective
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attack
somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attack
somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective
ours: improve confidence
1 2
3 4
Overview of FAKEBOB
5 Over-the-air attack
somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective
ours: improve confidence
1 2
3 4
Experimental result
Attack Open-source
Experimental result
Attack Open-source
≈ 100% attack success rate (ASR)
Experimental result
Attack Open-source
≈ 100% attack success rate (ASR)
Attack Commercial
Experimental result
Attack Open-source
≈ 100% attack success rate (ASR)
Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial
Experimental result
Attack Open-source
≈ 100% attack success rate (ASR)
Microsoft Azure: 26% ASR
Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial
Experimental result
Over the air Attack
Experimental result
Over the air Attack
Experimental result
different distance between loundspeakerand microphone
Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10
Over the air Attack
Experimental result
Different devices (at least 70% ASR)
JBL portable speaker
Shinco broadcast equipment
Loundspeaker:
Laptop
iPhone OPPO
Microphone:
different distance between loundspeakerand microphone
Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10
Device independent
Over the air Attack
Experimental result
JBL portable speaker
Shinco broadcast equipment
Loundspeaker:
Laptop
different acoustic environments White / Bus / Restaurant / Music noiseat least 48% ASR when noise < 60 dB
Different devices (at least 70% ASR)different distance between loundspeakerand microphone
Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10
iPhone OPPO
Microphone:
Device independentEnvironment independent
Imperceptibility
Imperceptibility
originalvoiceAlice
(source speaker)
utter
Imperceptibility
+
perturbation
originalvoice
adversarialvoice
craft
utter
Alice(source speaker)
Imperceptibility
+ “It is uttered by Bob”
Bob’s speaker model
perturbation
originalvoice
adversarialvoice
craft
utterrecognized (target)
Alice(source speaker)
Imperceptibility
+ “It is uttered by Bob”
“It is uttered by Alice”
Bob’s speaker model
perturbation
originalvoice
adversarialvoice
craft
utterrecognized
listened
The third person
(target)
(source)
Alice(source speaker)
Imperceptibility
+ “It is uttered by Bob”
“It is uttered by Alice”
Bob’s speaker model
perturbation
originalvoice
adversarialvoice
craft
utterrecognized
listened
The third person
imperceptibility in SRSs
(target)
(source)
Alice(source speaker)
Imperceptibility
quantitative analysis of imperceptibility
Imperceptibility
Q: How many people think adversarial and original voices are uttered by the same speaker ?
quantitative analysis of imperceptibility
Imperceptibility
Q: How many people think adversarial and original voices are uttered by the same speaker ?
A: Human Study on Amazon MTurk
quantitative analysis of imperceptibility
Imperceptibility
Q: How many people think adversarial and original voices are uttered by the same speaker ?
A: Human Study on Amazon MTurk
API attack: 64.9% same
quantitative analysis of imperceptibility
Imperceptibility
Over-the-air attack: 34.0% same
API attack: 64.9% same
Q: How many people think adversarial and original voices are uttered by the same speaker ?
A: Human Study on Amazon MTurk
quantitative analysis of imperceptibility
Take away:1. Black-box and practical adversarial attack against
speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing
FAKEBOB Website: https://sites.google.com/view/fakebob/home
FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB
Icon made by Freepik from www.flaticon.com
Icon made by xnimrodx from www.flaticon.com
Icon made by Eucalyp from www.flaticon.com
Icon made by Becris from www.flaticon.com
Take away:1. Black-box and practical adversarial attack against
speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing
FAKEBOB Website: https://sites.google.com/view/fakebob/home
FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB