Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...

Post on 07-Sep-2021

4 views 0 download

Transcript of Who is Real Bob? Adversarial Attacks on Speaker Recognition … · 2021. 5. 3. · Who is Real Bob?...

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

Fu Song (songfu@shanghaitech.edu.cn)

Speaker Recognition Systems (SRSs)

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Ubiquitous Application

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Ubiquitous Application

Voice assistant wake up

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction

Speaker Recognition Systems (SRSs)

Bob ImposterAlice

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction App log in

WeChat

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction App log in

WeChat

Safety-critical scenario

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction App log in

WeChat

Safety-critical scenarioOnce broken

Safety-critical scenarioOnce broken

property damage

reputation degrade

sensitive information leak…

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction App log in

WeChat

Security of SRSs!!!

Ubiquitous Application

Voice assistant wake up Personalized service on smart home

Financial transaction App log in

WeChat

Mainstream implementation of SRSs

Mainstream implementation of SRSs

Machine Learning (ML)

Mainstream implementation of SRSs

Machine Learning (ML)

However,

Mainstream implementation of SRSs

Machine Learning (ML)

However, ML is vulnerable to adversarial examples

Mainstream implementation of SRSs

Machine Learning (ML)

However, ML is vulnerable to adversarial examples

Ian Goodfellow et al.

Mainstream implementation of SRSs

Machine Learning (ML)

However, ML is vulnerable to adversarial examples

Ian Goodfellow et al. Nicholas Carlini et al.

Is adversarial attack practical on SRSs ?

Is adversarial attack practical on SRSs ?

FAKEBOBBlack-boxAppliable to general SRS taskEffective on commercial SRSsEffective in over-the-air attack

Threat model

Threat modelAttacker Goal: pass voice authentication; gain access to privilege

Threat modelAttacker Goal: pass voice authentication; gain access to privilege

Attacker Capability: no information about model structure / parameter;

Threat modelAttacker Goal: pass voice authentication; gain access to privilege

Attacker Capability: no information about model structure / parameter; limited to query the speak model of the victims

Overview of FAKEBOB

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design.

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeeds

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism

Open-set identification (OSI) task𝜃𝜃: threshold

1 2

3 4

Overview of FAKEBOB

1 Effective loss function design. Goal: 𝑓𝑓 𝑥𝑥 ≤ 0 ↔ attack succeedsBased on scoring and decision-making mechanism

Open-set identification (OSI) task𝜃𝜃: threshold

Tailored for different SRSs tasks: CSI, SV, OSI

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attacker

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm

1 2

3 4

Overview of FAKEBOB

2 Threshold: unique in VPR; unknown to attackerNovel threshold estimation algorithm

�̂�𝜃 ≈ 𝜃𝜃

1 2

3 4

Overview of FAKEBOB

3 NES-based gradient estimation

1 2

3 4

Overview of FAKEBOB

3 NES-based gradient estimation

rely on scores and decisions by querying victim speaker model

1 2

3 4

Overview of FAKEBOB

3 NES-based gradient estimation

rely on scores and decisions by querying victim speaker model Black-box

1 2

3 4

Overview of FAKEBOB

3 NES-based gradient estimation

4 Solve the optimization problem by gradient descent

gradient information

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attacknoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

previous work: noise modelnoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective

ours: improve confidence

1 2

3 4

Overview of FAKEBOB

5 Over-the-air attack

somehow environment- and device- dependentprevious work: noise modelnoise in air makes attack ineffective

ours: improve confidence

1 2

3 4

Experimental result

Attack Open-source

Experimental result

Attack Open-source

≈ 100% attack success rate (ASR)

Experimental result

Attack Open-source

≈ 100% attack success rate (ASR)

Attack Commercial

Experimental result

Attack Open-source

≈ 100% attack success rate (ASR)

Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial

Experimental result

Attack Open-source

≈ 100% attack success rate (ASR)

Microsoft Azure: 26% ASR

Talentedsoft: 100% ASR; 2500 query on averageAttack Commercial

Experimental result

Over the air Attack

Experimental result

Over the air Attack

Experimental result

different distance between loundspeakerand microphone

Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10

Over the air Attack

Experimental result

Different devices (at least 70% ASR)

JBL portable speaker

Shinco broadcast equipment

Loundspeaker:

Laptop

iPhone OPPO

Microphone:

different distance between loundspeakerand microphone

Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10

Device independent

Over the air Attack

Experimental result

JBL portable speaker

Shinco broadcast equipment

Loundspeaker:

Laptop

different acoustic environments White / Bus / Restaurant / Music noiseat least 48% ASR when noise < 60 dB

Different devices (at least 70% ASR)different distance between loundspeakerand microphone

Distance (meter) 0.25 0.5 1 2 4 8ASR (%) 100 100 100 70 40 10

iPhone OPPO

Microphone:

Device independentEnvironment independent

Imperceptibility

Imperceptibility

originalvoiceAlice

(source speaker)

utter

Imperceptibility

+

perturbation

originalvoice

adversarialvoice

craft

utter

Alice(source speaker)

Imperceptibility

+ “It is uttered by Bob”

Bob’s speaker model

perturbation

originalvoice

adversarialvoice

craft

utterrecognized (target)

Alice(source speaker)

Imperceptibility

+ “It is uttered by Bob”

“It is uttered by Alice”

Bob’s speaker model

perturbation

originalvoice

adversarialvoice

craft

utterrecognized

listened

The third person

(target)

(source)

Alice(source speaker)

Imperceptibility

+ “It is uttered by Bob”

“It is uttered by Alice”

Bob’s speaker model

perturbation

originalvoice

adversarialvoice

craft

utterrecognized

listened

The third person

imperceptibility in SRSs

(target)

(source)

Alice(source speaker)

Imperceptibility

quantitative analysis of imperceptibility

Imperceptibility

Q: How many people think adversarial and original voices are uttered by the same speaker ?

quantitative analysis of imperceptibility

Imperceptibility

Q: How many people think adversarial and original voices are uttered by the same speaker ?

A: Human Study on Amazon MTurk

quantitative analysis of imperceptibility

Imperceptibility

Q: How many people think adversarial and original voices are uttered by the same speaker ?

A: Human Study on Amazon MTurk

API attack: 64.9% same

quantitative analysis of imperceptibility

Imperceptibility

Over-the-air attack: 34.0% same

API attack: 64.9% same

Q: How many people think adversarial and original voices are uttered by the same speaker ?

A: Human Study on Amazon MTurk

quantitative analysis of imperceptibility

Take away:1. Black-box and practical adversarial attack against

speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing

FAKEBOB Website: https://sites.google.com/view/fakebob/home

FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB

Icon made by Freepik from www.flaticon.com

Icon made by xnimrodx from www.flaticon.com

Icon made by Eucalyp from www.flaticon.com

Icon made by Becris from www.flaticon.com

Take away:1. Black-box and practical adversarial attack against

speaker recognition systems2. Effective to commercial speaker recognition services3. Effective in over-the-air attack4. Imperceptible to human hearing

FAKEBOB Website: https://sites.google.com/view/fakebob/home

FAKEBOB Code:https://github.com/FAKEBOB-adversarial-attack/FAKEBOB