Finding Optimum Abstractions in Parametric Dataflow Analysis

39
Finding Optimum Abstractions in Parametric Dataflow Analysis Xin Zhang Georgia Tech Mayur Naik Georgia Tech Hongseok Yang University of Oxford

description

Finding Optimum Abstractions in Parametric Dataflow Analysis. Xin Zhang Georgia Tech. Mayur Naik Georgia Tech. Hongseok Yang University of Oxford. A Key Challenge for Static Analysis. Scalability. Precision. Our setting. Abstraction a. assert(x != null). Static Analysis S. Program p. - PowerPoint PPT Presentation

Transcript of Finding Optimum Abstractions in Parametric Dataflow Analysis

Page 1: Finding Optimum Abstractions in Parametric Dataflow Analysis

Finding Optimum Abstractions in Parametric Dataflow Analysis

Xin ZhangGeorgia Tech

Mayur NaikGeorgia Tech

Hongseok YangUniversity of Oxford

Page 2: Finding Optimum Abstractions in Parametric Dataflow Analysis

A Key Challenge for Static Analysis

Precision

Scalability

Page 3: Finding Optimum Abstractions in Parametric Dataflow Analysis

Our setting

Query qProgram pStatic Analysis S

p ` q p 0 q

Abstraction a

assert(x != null)

Page 4: Finding Optimum Abstractions in Parametric Dataflow Analysis

p

a1

Sq1

p ` q1 ?

q2S

p ` q2 ?

a2

Our setting

Page 5: Finding Optimum Abstractions in Parametric Dataflow Analysis

q2p S

p ` q2 ?

Sq1

p ` q1 ?

Our setting

1 0 1 1 0 0 1 0 1 0

Page 6: Finding Optimum Abstractions in Parametric Dataflow Analysis

q2p S

p ` q2 ?

Sq1

p ` q1 ?

Example 1: Predicate Abstraction

1 0 1 1 0 0 1 0 1 0

Predicates to use in predicate abstractionPredicates to use as

abstraction predicates

Page 7: Finding Optimum Abstractions in Parametric Dataflow Analysis

q2p S

p ` q2 ?

Sq1

p ` q1 ?

Example 2: Cloning ‐based Pointer Analysis

1 0 1 1 0 0 1 0 1 0

Predicates to use in predicate abstraction

K value to use for each call and each allocation site

Page 8: Finding Optimum Abstractions in Parametric Dataflow Analysis

Problem StatementAn efficient algorithm with:

INPUTS:– program p and property q– abstractions A = { a1, …, an }

– boolean function S(p, q, a)

OUTPUT:– Proof: a 2 A: S(p, q, a) = true

8 a’ 2 A: (a’ · a Æ S(p, q, a’) = true) ) a’ = a

– Impossibility: @ a 2 A: S(p, q, a) = trueOptimum

Abstraction

qp S

p ` q ?

a

Page 9: Finding Optimum Abstractions in Parametric Dataflow Analysis

Problem StatementAn efficient algorithm with:

INPUTS:– program p and property q– abstractions A = { a1, …, an }

– boolean function S(p, q, a)

OUTPUT:– Proof: a 2 A: S(p, q, a) = true

8 a’ 2 A: (a’ · a Æ S(p, q, a’) = true) ) a’ = a

– Impossibility: @ a 2 A: S(p, q, a) = trueOptimum

Abstraction

S(p, q, a)

!S(p, q, a)

1111 most expensive

0000 least expensive

0110 optimum

A

Page 10: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

x = new File;<{closed}, {x}>y = x;

z = x;

x.open();

y.close();

assert1(x, closed);assert2(x, opened);

openedclosed

error

open()

close()

close() open()

Type-state set ts

Page 11: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

x = new File;<{closed}, {x}>y = x;<{closed}, {x}>z = x;<{closed}, {x}>x.open();<{opened}, {x}>y.close();<{opened, closed}, {x}>assert1(x, closed);assert2(x, opened);

Must-alias accesspath set ms

Only allows the accesspaths specified in the abstraction

Strong update

Weak updateFailedFailed

Page 12: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysisx = new File;y = x;z = x;x.open();y.close();assert1(x, closed);assert2(x, opened);

Query Abstraction

assert1 any a

assert2 none

Query Abstraction Our Goal

assert1 any a

assert2 none impossibility

Page 13: Finding Optimum Abstractions in Parametric Dataflow Analysis

x = new File;y = x;z = x;x.open();y.close();assert1(x, closed);assert2(x, opened);

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Naïve approach: calculating weakest precondition (WP)

{}

Failed

Page 14: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Naïve approach: calculating weakest precondition (WP)

{}

Failed

Exponential Blowup!

unreachablex = new File;y = x;z = x;x.open();y.close();assert1(x, closed);assert2(x, opened);

Page 15: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Too large?

Let’s ignore part of it!

Page 16: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Unreachable

Page 17: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Intersect with the forward state

Page 18: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis↑ x = new File;↓<{closed}, {}>↑ y = x;↓<{closed}, {}>↑

z = x;↓<{closed}, {}>↑

x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Keep as many disjuncts as possible

Intersect with forward state

Page 19: Finding Optimum Abstractions in Parametric Dataflow Analysis

x = new File;y = x;z = x;x.open();y.close();assert1(x, closed);assert2(x, opened);

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {}>↑y = x;↓<{closed}, {}>↑z = x;↓<{closed}, {}>↑x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Our approach: WP + Underapproximation

Failed

Page 20: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {}>↑y = x;↓<{closed}, {}>↑z = x;↓<{closed}, {}>↑x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert1(x, closed);

Our approach: WP + Underapproximation

Failed

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Page 21: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {x}>↑

y = x;↓<{closed}, {x}>↑z = x;↓<{closed}, {x}>↑x.open();↓<{opened}, {x}>↑y.close();↓<{opened}, {x}>↑assert1(x, closed);

Our approach: WP + Underapproximation

Failed

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Page 22: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {x}>↑

y = x;↓<{closed}, {x}>↑z = x;↓<{closed}, {x}>↑x.open();↓<{opened}, {x}>↑y.close();↓<{opened}, {x}>↑assert1(x, closed);

Our approach: WP + Underapproximation

Failed

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Page 23: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysisx = new File;↓<{closed}, {x}>

y = x;↓<{closed}, {x, y}>

z = x;↓<{closed}, {x, y}>

x.open();↓<{opened}, {x, y}>

y.close();↓<{closed}, {x, y}>

assert1(x, closed);

Our approach: WP + Underapproximation

Proof!

𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Query Abstraction

assert1

assert2

Page 24: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysisx = new File;y = x;z = x;x.open();y.close();assert1(x, closed);assert2(x, opened);

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {}>↑y = x;↓<{closed}, {}>↑z = x;↓<{closed}, {}>↑x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert2(x, opened);

Our approach: WP + Underapproximation

Failed

Page 25: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {}>↑y = x;↓<{closed}, {}>↑z = x;↓<{closed}, {}>↑x.open();↓<{closed, opened}, {}>↑y.close();↓top↑assert2(x, opened);

Our approach: WP + Underapproximation

Failed

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Page 26: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {x}>↑y = x;↓<{closed}, {x}>↑z = x;↓<{closed}, {x}>↑x.open();↓<{opened}, {x}>↑y.close();↓<{opened,closed}, {x}>↑assert2(x, opened);

Our approach: WP + Underapproximation

Failed

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

Page 27: Finding Optimum Abstractions in Parametric Dataflow Analysis

Example: Typestate Analysis

Query Abstraction

assert1

assert2

↑x = new File;↓<{closed}, {x}>↑y = x;↓<{closed}, {x}>↑z = x;↓<{closed}, {x}>↑x.open();↓<{opened}, {x}>↑y.close();↓<{opened,closed}, {x}>↑assert2(x, opened);

Our approach: WP + Underapproximation

Failed

Impossibility!

𝑥∈𝒂𝑥∉𝒂

𝒚 ∈𝒂

𝒚 ∉𝒂

In paper: a general framework for parametric

dataflow analysis

Page 28: Finding Optimum Abstractions in Parametric Dataflow Analysis

Experiment

Implementation in Chord for Java programs

2 Client Analyses: Typestate and Thread-Escape Both fully context- and flow-sensitive analysesOnly scale with sparse parameters

7 Java Benchmarks

Page 29: Finding Optimum Abstractions in Parametric Dataflow Analysis

Benchmarks

name bytecode(KB) KLOC log|A|

thread-escape typestate

tsp 391 269 569 6,175

elevator 390 269 352 6,180

hedc 442 283 1,400 7,326

weblech 504 326 2,993 7,663

antlr 532 303 16,563 7,748

avrora 634 340 37,797 10,151

lusearch 511 314 14,508 7,395

Page 30: Finding Optimum Abstractions in Parametric Dataflow Analysis

Precision: Thread-Escape Analysis

tsp

elev

ator

hedc

web

lech

antlr

avro

ra

luse

arch

AVG.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Unresolved

Impossible

Proven

% Q

ue

rie

s

209 221 552 658 5857 14322 6726 (Total # Queries)

Resolved: ~90%Previous: ~40%

[POPL12]

Page 31: Finding Optimum Abstractions in Parametric Dataflow Analysis

Precision: Typestate Analysis

tsp

elev

ator

hedc

web

lech

antlr

avro

ra

luse

arch

AVG.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Impossible

Proven

% Q

ue

rie

s

12 72 170 71 7903 5052 3644 (Total # Queries)

Page 32: Finding Optimum Abstractions in Parametric Dataflow Analysis

Scalability: Number of iterations

1 2 3 4 5 6 7 8 9 10 11-970

1000

2000

3000

4000

5000

6000

avrora

Proven

Impossible

# analysis iterations

# q

uer

ies

Page 33: Finding Optimum Abstractions in Parametric Dataflow Analysis

Scalability: Number of iterations

1 2 3 4 5 6 7 8 9 10 11-88

0

400

800

1200

antlr

# analysis iterations

# q

uer

ies

1 2 3 4 5 6 7 8 9 10 11-20

0500

1000150020002500

lusearch

# analysis iterations

# q

uer

ies

1 2 3 4 5 6 7 8 9 10 11-97

0100020003000400050006000

avroraProven

# analysis iterations

# q

uer

ies

Page 34: Finding Optimum Abstractions in Parametric Dataflow Analysis

Scalability: Running time

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-1730

1000

2000

3000

4000

5000

6000

avrora

Proven Impossible

analysis time (minutes)

# q

uer

ies

Page 35: Finding Optimum Abstractions in Parametric Dataflow Analysis

Scalability: Running time

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10

10-76

0

400

800

1200

antlr

Analysis Time (minutes)

# q

ue

rie

s

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10

10-44

0

500

1000

1500

2000

lusearch

analysis time (minutes)

# q

uer

ies

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10

10-173

0

2000

4000

6000

avroraProvenImpossible

analysis time (minutes)

# q

uer

ies

Page 36: Finding Optimum Abstractions in Parametric Dataflow Analysis

Size of optimal abstractions

1 2 3 4 5 6 7 8 9 10 11-960

1000

2000

3000

4000

5000

60005436

954 892

164 68 110 13 181 41 13 123

avrora

size of abstraction |a|

# p

rove

n q

uer

ies

Page 37: Finding Optimum Abstractions in Parametric Dataflow Analysis

Size of optimal abstractions

1 2 3 4 5 6 7 8 9 10 11-87

0

200

400

600

800

1000

1200

1400 1275

706

390

79 39 19 4 2 6 3 13

antlr

1 2 3 4 5 6 7 8 9 10 18-18

0

500

1000

1500

2000

2500 2345

805

295129 86 23 4 3 15 2 1

lusearch

1 2 3 4 5 6 7 8 9 10 11-96

0

1000

2000

3000

4000

5000

6000 5436

954 892

164 68 110 13 181 41 13 123

avrora

size of abstraction |a|

# p

rove

n q

uer

ies

Page 38: Finding Optimum Abstractions in Parametric Dataflow Analysis

Related workModern pointer analysis

Demand-driven, query-driven, … Heintze & Tardieu ’01, Guyer & Lin ’03, Sridharan & Bodik ’06, ...

CEGAR model checkers: SLAM, BLAST, YOGI, …Work on concrete counterexamples

Can disprove queries

1. No optimality guarantee – can over-refineand hurt scalability.

2. No impossibility - can cause divergence.

Page 39: Finding Optimum Abstractions in Parametric Dataflow Analysis

Thank you!

Q&A