Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and...

52
Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko, Luke Zettlemoyer, Tadayoshi Kohno sim [email protected] homes.cs.washington.edu/~simkol

Transcript of Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and...

Page 1: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution

Lucy Simko, Luke Zettlemoyer, Tadayoshi Kohno

sim

[email protected] homes.cs.washington.edu/~simkol

Page 2: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!2

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

?

Page 3: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Caliskan-Islam et al. “De-anonymizing programmers via code stylometry.” 24th USENIX Security Symposium (USENIX Security), Washington, DC. 2015.

● 98% accuracy over 250 programmers ● Extract syntactic, lexical, and layout features from C/C++ code ● Random Forest classifier ● Data set: Google Code Jam

○ Programming competition ○ Lots of examples of people solving the same problem in different ways

● Open source

!3

State of the Art: Source Code Attribution

Page 4: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!4

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

?

98% accuracy!

Page 5: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!5

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

?

98% accuracy!

Page 6: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!6

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

?

98% accuracy!

CENSORED

CENSORED

Page 7: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Can we fool source code attribution classifiers?

!7

Research Question

Yes!

Methodology: Lab study* with C programmers

*Approved by University of Washington’s Human Subjects Division (IRB)

Page 8: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Motivation and Research Question

● Source Code Attribution: Overview and Background ● Evading Source Code Attribution: Definitions and Goals

● Methodology

● Results: Conservative Estimate of Adversarial Success

● Results: How to Create Forgeries

!8

Outline

Page 9: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!9

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

?

Page 10: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!10

Source Code Attribution

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

DC

A

B

E

F

Classifier

Page 11: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1; ...

!11

Source Code Attribution

DC

A

B

E

F

ClassifierPc

Page 12: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!12

Source Code Attribution

Classifier

{A, B, C, D, E}

Pc C

Page 13: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!13

Source Code Attribution

ClassifierPc

{A, B, C, D, E}

C ✓Who the classifier thinks wrote this code.

Page 14: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Motivation and Research Question

● Source Code Attribution: Overview and Background

● Evading Source Code Attribution: Definitions and Goals ● Methodology

● Results: Conservative Estimate of Adversarial Success

● Results: How to Create Forgeries

!14

Outline

Page 15: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

1. Train: Given code from original and target authors, learn styles 2. Modify original code to imitate target author (forgery)

● Or just hide the original author’s style (masking)

!15

Evading Source Code Attribution

PcAdversarial manipulation Pc’

Code originally by C, but modified by an adversary.

Page 16: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

1. Train: Given code from original and target authors, learn styles 2. Modify original code to imitate target author (forgery)

● Or just hide the original author’s style (masking)

!16

Evading Source Code Attribution

PcAdversarial manipulation Pc’

{A, B, C, D, E}

Classifier A

Forgery

Page 17: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Motivation and Research Question

● Source Code Attribution: Overview and Background

● Evading Source Code Attribution: Definitions and Goals

● Methodology ● Results: Conservative Estimate of Adversarial Success

● Results: How to Create Forgeries

!17

Outline

Page 18: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lab Study: Dataset

● C code ● We used a linter1 to eliminate many typographic style differences ● ~4000 authors: avg 2.2 files each ● 5 authors with the most files: avg ~42.8 files

○ Authors: A, B, C, D, E

1 http://astyle.sourceforge.net/

Page 19: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lab Study: Create Forgeries

C5

{A, B, C, D, E}

Precision: 100% Recall: 100% (10-fold XV)

Page 20: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lab Study: Create Forgeries

C20

{A, B, C, D, E, ... + 15}

Precision: 87.6% Recall: 88.2% (10-fold XV)

Page 21: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lab Study: Create Forgeries

C50

{A, B, C, D, E, ... + 45}

Precision: 82.3% Recall: 84.5% (10-fold XV)

Page 22: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!22

Lab Study: Create Forgeries

28 C programmers (participants): 1. Train: Given code from original and target author, learn styles 2. Modify original code to imitate target author’s style (forgery)

PxParticipant modifies Px

Classifier Y

Forgery

X, Y ∈ {A, B, C, D, E}

Px’

Page 23: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!23

Lab Study: Create Forgeries

28 C programmers (participants): 1. Train: Given code from original and target author, learn styles 2. Modify original code to imitate target author’s style (forgery) 3. Check forgery success against oracle classifiers

X, Y ∈ {A, B, C, D, E}

Px’

XC5

YY

C20

C50

PxParticipant modifies Px

Page 24: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Motivation and Research Question

● Source Code Attribution: Overview and Background

● Evading Source Code Attribution: Definitions and Goals

● Methodology

● Results: Conservative Estimate of Adversarial Success ● Results: How to Create Forgeries

!24

Outline

Page 25: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Percent of final forgery attempts that were successful attacks

C5 C20 C50

Forgery 66.6% 70.0% 73.0%

Masking 76.6% 76.6% 86.6%

!25

Versions of the state-of-the-art machine classifier. The subscript indicates the number of authors in the training set.

Results: Estimate of Adversarial Success

Page 26: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

C5 C20 C50

Forgery 66.6% 70.0% 73.0%

Masking 76.6% 76.6% 86.6%

Percent of final forgery attempts that were successful attacks !26

Forgery: adversary is pretending to be a specific target author. Masking: adversary is obscuring the original author.

Results: Estimate of Adversarial Success

Page 27: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

C5 C20 C50

Forgery 66.6% 70.0% 73.0%

Masking 76.6% 76.6% 86.6%

Percent of final forgery attempts that were successful attacks!27

A successful forgery attack means the classifier output the target author instead of the original author of the code. 66.6% of forgery attacks against the C5 classifier were successful.

Results: Estimate of Adversarial Success

Page 28: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

C5 C20 C50

Forgery 66.6% 70.0% 73.0%

Masking 76.6% 76.6% 86.6%

Percent of final forgery attempts that produced a misclassification!28

C50 attributed forgeries correctly only 13.4% of the time.

Results: Estimate of Adversarial Success

Page 29: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Percent of final forgery attempts that produced a misclassification

Lesson: Non-experts can successfully attack this state-of-the-art classifier, suggesting other authorship classifiers may be vulnerable to the same type of attacks.

!29

Results: Estimate of Adversarial Success

C5 C20 C50

Forgery 66.6% 70.0% 73.0%

Masking 76.6% 76.6% 86.6%

Page 30: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Motivation and Research Question

● Source Code Attribution: Overview and Background

● Evading Source Code Attribution: Definitions and Goals

● Methodology

● Results: Conservative Estimate of Adversarial Success

● Results: How to Create Forgeries

!30

Outline

Page 31: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lesson: Forgers did not know the features the classifier was using for attribution. This suggests that forgeries in the wild might contain the same types of modifications.

!31

Results: Methods of Forgery Creation

Page 32: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Example: Two Programs by Author C// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1;

// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, t, ok; int a, b, c; int size, count = 0; scanf ("%d", &size);

while (size--) { scanf ("%d%d", &n, &m); rep (i, m) { scanf ("%d", s + i);

Page 33: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, st; char in[10000]; int fg[5000], chk[128]; int size, count = 0, res; scanf ("%d%d%d", &len, &n, &size); rep (i, n) scanf ("%s", dic[i]);

while (size--) { scanf ("%s", in); st = 0; rep (k, n) fg[k] = 1;

Example: Two Programs by Author C// libraries imported #define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n) // variables defined int main() { int i, j, k, l, m, n, t, ok; int a, b, c; int size, count = 0; scanf ("%d", &size);

while (size--) { scanf ("%d%d", &n, &m); rep (i, m) { scanf ("%d", s + i);

Page 34: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Example: Forgery of Author C

Information Structure

● Variable name ● Syntax ● Macros ● API calls

Control Flow

● Loop type

Page 35: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Example: Creating a Forgery of Author C

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

!35Classifier output: A

Page 36: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

ORIGINAL FORGERY

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

!36Classifier output: A Classifier output: ??

Page 37: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!37Classifier output: A Classifier output: ??

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

Page 38: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!38Classifier output: A Classifier output: ??

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; cin >> size; for(count=1;count<=size;count++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

Page 39: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!39Classifier output: A Classifier output: ??

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; cin >> size; for(count=1;count<=size;count++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

Page 40: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!40Classifier output: A Classifier output: ??

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; scanf("%d", &size); for(count=1;count<=size;count++) { scanf("%d%d%d%d", &D, &I, &M, &N); for(i=0; i<N; i++) scanf("%d", original+i); ...

Page 41: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!41Classifier output: A Classifier output: ??

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N) scanf("%d", original+i); ...

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

Page 42: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!42Classifier output: A Classifier output: ??

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N)scanf("%d", original+i); ...

Page 43: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

ORIGINAL FORGERY

!43Classifier output: A Classifier output: C

int main() { int i,j,k; int cc,ca; cin >> ca; for(cc=1;cc<=ca;cc++) { cin >> D >> I >> M >> N; for(i=0; i<N; i++) cin >> original[i]; ...

#define REP(i,a,b) for(i=a;i<b;i++) #define rep(i,n) REP(i,0,n)

int main() { int i,j,k; int size, count = 0; scanf("%d", &size); while (size--) { scanf("%d%d%d%d", &D, &I, &M, &N); rep (i,N)scanf("%d", original+i); ...

Page 44: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!44

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls

Control Flow

● Loop type

Page 45: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!45

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

Page 46: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!46

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

Local modifications: only need to understand a line or two of code Lo

cal

Page 47: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!47

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

Local modifications: only need to understand a line or two of code

Algorithmic modifications: need a more comprehensive understanding of the code

Loca

l

Page 48: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!48

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

● Variable type ● Data structures ● Static and dynamic

memory usage

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

Local modifications: only need to understand a line or two of code

Algorithmic modifications: need a more comprehensive understanding of the code

Loca

lA

lgor

ithm

ic

Page 49: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!49

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

● Variable type ● Data structures ● Static and dynamic

memory usage

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

Local modifications: only need to understand a line or two of code

Algorithmic modifications: need a more comprehensive understanding of the code

Loca

lA

lgor

ithm

ic

X

Page 50: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

!50

Results: Methods of Forgery Creation

Information Structure

● Variable name ● Syntax ● Macros ● API calls ● Libraries imported ● Variable decl location

● Variable type ● Data structures ● Static and dynamic

memory usage

Control Flow

● Loop type ● If-statements ● Assignments per line ● Control flow keywords ● Loop logic

● Functions refactored ● Inlined API calls ● Major addition or

removal of control structures

Local modifications: only need to understand a line or two of code

Algorithmic modifications: need a more comprehensive understanding of the code

Loca

lA

lgor

ithm

ic

X

Page 51: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

Lessons from methods of forgery creation:

● Local modifications are common.

● Some forgers copied code directly the target author’s training set.

!51

Results: Methods of Forgery Creation

Page 52: Recognizing and Imitating Programmer Style: Adversaries in … · 2019-08-16 · Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution Lucy Simko,

● Programmers desiring privacy or with malicious intent may seek to

evade source code attribution classifiers

● Lab study with C programmers producing forgeries, showing

unsophisticated adversaries can fool a state of the art classifier

● Forgeries were successful with local changes that do not require a

high-level understanding of the programming style.

● More recommendations in paper! My coauthors: Luke Zettlemoyer, Tadayoshi Kohno Contact me: Lucy Simko, [email protected], https://homes.cs.washington.edu/~simkol/

!52

Summary