Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

37
Reachability Analysis for Callbacks 北北北北 北北 [email protected] 2015.4.25

description

Introduction  Program Analysis  General speaking, automated analysis of program behaviors  Flow analysis tasks data/control flow analysis information flow analysis (security) points-to/alias analysis … can be modeled as graph reachability problems

Transcript of Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Page 1: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

北京大学唐浩[email protected]

2015.4.25

Page 2: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

OutlineIntroduction

Program Analysis• Graph reachability problem

Summary-based Analysis• One challenge: callbacks

CFL-reachabilityReachability Analysis for Callbacks

Callbacks: conditionsTAL-reachability: conditional reachability

Page 3: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionProgram Analysis

General speaking, automated analysis of program behaviors

Flow analysis tasks • data/control flow analysis• information flow analysis (security)• points-to/alias analysis• …• can be modeled as graph reachability problems

Page 4: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionExample: transitive data dependence analysis int gcd(int a, int b, string msg) { write(msg); while (b != 0) { int tmp = a % b; a = b; b = tmp; } return a; }

a b

tmp

ret msg

All transtive data dependence relationships

a --> b, b --> a, a --> tmp, b --> tmp, tmp --> a, tmp --> b,a --> ret, b --> ret, tmp --> ret

Page 5: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionSummary-based Analysis

Summarizing behaviors of a component (modular/compositional analysis)• Result: summary• Goal:

• reusable: to reuse analysis result• concise: to hind internal complexity• efficient: to avoid unnecessary re-computation

• A general model• transferring summary function from entries to exits

Page 6: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionExample: transitive data dependence analysis int gcd(int a, int b, string msg) { write(msg); while (b != 0) { int tmp = a % b; a = b; b = tmp; } return a; }

a b

tmp

ret msg

All transtive data dependence relationships

a --> b, b --> a, a --> tmp, b --> tmp, tmp --> a, tmp --> b,

a --> ret, b --> ret, tmp --> ret

Summary:a --> retb --> ret

Page 7: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionSummary-based Analysis

Summarizing behaviors of a component (modular/compositional analysis)

Challenge: handling incompleteness (incomplete/partial program analysis)• calling context

• unknown parameters• global variables• …

• callbacks (due to dynamic dispatch)• unknown client code

?

?

Page 8: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

IntroductionExampleclass Math { int gcd(int a, int b); int gcd20(int a) { return gcd(a, 20); }}

class Math1 extends Math { int gcd(int a, int b) {…}}class Math2 extends Math { int gcd(int a, int b) {…}}// mainMath2 m = new Math2();int x = 30;int y = m.gcd20(x);

Math::gcd20

Math::gcd

Math2::gcd

Math1::gcd

Mainclient code

library code incomplete

Page 9: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

CFL-reachabilityInterprocedural Analysis

IFDS/IDE [Reps et al. 1995, Sagiv et al. 1996]• Realizable path: matched parentheses • Filtering out unrealizable paths

void fun() { … y1 = p(x1); … y2 = p(x2); …}

int p(int x) { …}

{1

}1}

2

{2

matched parenthesis language

SeSSSS{i S }i , i = 1,2,…

* We only discuss realizable paths (reachability) defined by matched parenthesis language in the following part.

Page 10: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

CFL-reachabilityAlgorithm: Dynamic Programming

(similar to Floyd-Warshall Algorithm)O(n3)

{1

{2

}1

}2

a p

b qe

S

S

S

x y

matched parenthesis language

SeSSSS{i S }i , i = 1,2,…

GraphInvocation edge: {i

Return edge: }i

Normal edge: e

Page 11: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksSummarizing “Incomplete” Graph

Postponing analysis of callbacksLeaving unnecessary nodes in the summary

{2 {3 {4

}4}3}2

Se SSS S{i S }i , i = 1,2,…

library

a c

db

{1

}1

{5

}5

u

v

x

y

d=g(c), [g: callback

function]

Page 12: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

{2 {3 {4

}4}3}2

library

a c

db

callback siteConditional

Reachability

Page 13: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksConditional Reachability

CRa,b(x,y): x ~> y, if a ~> b

Unconditional Reachability (by CFL reachability)UR(x,y): x ~> y

Summary: CRa,b(x,y) and UR(x,y)

x a b y{ }

x a b y{ }

Page 14: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

Client-code AnalysisTurn conditional into unconditional

• if the condition is satisfied

CRc,d(a,b)

library

a c

db

{1

}1

{5

}5

x

y

UR(a,b)

Page 15: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksLibrary Summarization

Unconditional Reachability• CFL-reachability

Conditional Reachability• ?

Page 16: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksTree Adjoining Language (TAL)

Mildly Context Sensitive LanguageParsable in O(n6)

Application: Natural Language ProcessingOur Contribution

TAL-Reachability: Conditional Reachability

Page 17: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

Tree Adjoining LanguageStringsNon-terminals

Page 18: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

Operators

Page 19: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

First-order “S”One string

Reachability for a 2-tuple (x,y)One path

x a b y{ }

UR(x,y)

Page 20: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

Second-order “ ”𝕊A pair of strings

Reachability for a 4-tuple (x,a,b,y)A pair of paths

x a b y{ }

CRa,b(x,y)

Page 21: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksOperators Operations for TAL-reachability

α

β

a

b q

p

Page 22: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksAlgorithm

Result: concise and efficient summaryKeep three types of node (empirically 10%)

• boundary nodes (entries and exits of the library)• chaining nodes• hidden chaining nodes

Evaluation: 8X

Page 23: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

Future WorkCallback analysis for real applications

• Android / Web applicationsA more general case

• Handling multiple callbacks in a path

Page 24: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

ConclusionAn important question, but few research

papersCallbacks in summary-based analysis techniques

Borrow ideas from other research fieldTree adjoining language (NLP)

Create conditions for unknown facts Instantiate when facts are available

Page 25: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Thank you!

Page 26: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Library Summarization: TAL ReachabilityComplete TAL

Grammar

Page 27: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Library Summarization: TAL Reachability

x1 x2

y1 y2

{i }i

{i(x1,y1) + }i(y2,x2) CRy1,y2(x1,x2)

Page 28: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Library Summarization: TAL Reachability

x1 x2

y

CRy,y(x1,x2) UR(x1,x2)

Page 29: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Library Summarization: TAL Reachability

x1 x2

y1 y2

z1 z2

CRy1,y2(x1,x2)+CRz1,z2(y1,y2) CRz1,z2(x1,x2)

Page 30: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Library Summarization: TAL Reachability

x1 x2

y1 y2

x0CRy1,y2(x1,x2)+UR (x0,x1) CRy1,y2(x0,x2)

Page 31: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksKeeping reachability between only boundary

nodes are not sufficientChaining nodes & Hidden chaining nodes

Chaining nodes (“connectors”): x1, x2

Hidden chaining nodes (“start/end nodes”): x0, x3

x0 x1 x2 x3

{2 }2 }3{3 }4{4

Page 32: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

{2+}2CRp,q(a,b){3+}3CRr,s(p,q){4+}4CRc,d(r,s)

{2 {3 {4

}4}3}2

library

a c

db

p

q

r

s

Page 33: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

{2+}2CRp,q(a,b)

{3+}3CRr,s(p,q)

{4+}4CRc,d(r,s)

CRp,q(a,b)+CRr,s(p,q)CRr,s(a,b)

CRr,s(p,q)+CRc,d(r,s)CRc,d(p,q)

CRr,s(a,b)+CRc,d(r,s)CRc,d(a,b)

{2 {3 {4

}4}3}2

library

a c

db

p

q

r

s

Page 34: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for Callbacks

CRp,q(a,b)CRr,s(p,q)CRc,d(r,s)CRr,s(a,b)CRc,d(p,q)CRc,d(a,b)

{2 {3 {4

}4}3}2

library

a c

db

• Redundant reachability relationships

p

q

r

s

boundary nodes

Page 35: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksEvaluation: 15 subjects

<10%Fund.

• Fundamental nodes• Boundary nodes• Chaining nodes• Hidden chaining

nodes

Page 36: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksEvaluation: library summarization

• 3.16X slow-down• More memory

required

Page 37: Reachability Analysis for Callbacks 北京大学 唐浩 2015.4.25.

Reachability Analysis for CallbacksEvaluation: client-code analysis

• 8.24X Speed-up• Less memory

required