Shape Analysis With Reference Sets

Mark MarronIMDEA-Software (Madrid, Spain)[email protected]*

We want to provide basic information about the program heap for supporting a range of client applications IDE tools (query, refactoring, etc.) OptimizationError DetectionFocus on scalable, manageable models/tools even at cost of overall expressivity/analytic power *

Fix sharing info extractionAdd disjoint/overlaps for set information

Point out, more than just variable relations is desirable, variables transient*

Track basic set relationsMembership, Overlapping, Non-OverlappingSubset, Set EqualityEnsure small computational costHigh precision is not required but must handle common cases accuratelyIterative subset construction/mutationSet style library operationsUnion (AddAll)IntersectionIsSubsetContains*

Start with existing model that decomposes heap into related regionsReduces the complexity of the set formula that are needed Storage shape graph works wellNodes represent sets of objects (or data structures), edges represent sets of pointersFine grained partitioning is possibleDisjointness properties are natural (and mostly free)Annotate edges with additional properties to track reference set relations

*

Key issue for shape graph approach is how to group concrete objects into abstract nodesToo many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures)Often done via allocation site or typesSolution: nodes are similar sets of objectsRecursive type information (recursive vs. non-recursive types)Objects stored in the same collection, array or structure

*

Given a set of heap references R the corresponding target set is:{Object o | r R that points to o}The two sets of heap references can be related with on the target setsAs the heap is partitioned into regions of objects we also define a notion of coverageA reference set covers a region if every object in the region is in the corresponding target set *

Several possible choices for representing these relationsTheory of sets over all objects/referencesFull binary relations on power sets of edges Reduced set of relationsFor efficiency we use a reduced set of relationsEquality of the reference sets abstracted by pairs of edges (E E)Relation from sets of edges to nodes that are covered by the abstracted references ((E) N)*

Track target set equality of the pointers abstracted by pairs of edges

*

Track if all nodes in region are contained in the target sets of given edges

*

There are a number of useful inferences that can be made from these two propertiesIf e, e are edge equivalent and e has an empty concretization then e must have an empty concretization as wellIf an edge e covers node n then any other in edge represents a target set that is to the target set for edge e

*

Note that the proposed reference set relations subsume classic must-aliasIn the concrete model variables x == y (x, y non-null) iff Target(x) = Target(y)In the abstract model the variables x, y must-alias iff the corresponding edges ex and ey are edge equivalent*

*...for(int i = 0; i < V.Length; ++i) V[i].f = 0;

*

BenchmarkLOCAnalysis TimeAnalysis Memem3d11030.11s

Tracking reference set information is computationally inexpensiveResults are precise enough to model many interesting/important relationsIn fact surprisingly soWhy? Most conditions end up being simpleIs this a general property? Are most programs made of simple relations/concepts which are composed into complex concepts (we hope so)Could we use rich set decision procedures, e.g. all conditions are simple most proofs easy/fast with right decomposition*

Build strong foundation for other tools to utilizeTransform core concepts from prototype to robust toolsFinish implementation of static analysis for CLI bytecode + core libraries (also runtime support)Export results to Visual Studio for inspection, spec. generation, or other toolsApply results in optimization, refactoring, and error detection applications*

*

Region similarity -> is the program treating the objects uniformly or has it done something to indicate that the objects in the two regions differ in a meaningful way*Size up to 5809 LOC (normalized + library stubs)runtimes are reasonable (under most measures) for use in optimizing applicationsexceptionally fast compared to other shape analysis techniques (only comparable is only lists of lists, no sharing and takes 100-1000 seconds for equiv size)*

Shape Analysis With Reference Sets

Documents

Transcript of Shape Analysis With Reference Sets