Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective...

17
Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien N. Nguyen Iowa State University Hridesh Rajan Iowa State University Gary T. Leavens University of Central Florida

Transcript of Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective...

Page 1: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence

Robert Dyer

Bowling GreenState University

Tien N. Nguyen

Iowa State University

Hridesh Rajan

Iowa StateUniversity

Gary T. Leavens

University ofCentral Florida

Page 2: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

2

Lack of specifications* is a critical SE problem

*specifications: {Pre} S {Post} i.e. behavioral specifications

Page 3: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

3

void m(…) { … }

void m(…) { // pre: P … // post: Q}

SpecificationInference

Page 4: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

4

Our Goal

Page 5: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

5

SpecGuru

A specification inference infrastructure

Page 6: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

6

Our Goal

Page 7: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

7

Related work: focuses on single projects [Ernst et al. 99] [Flanagan et al. 01] [Weimer et al. 05]

[Ramanathan et al. 07] [Wei et al. 11]

Page 8: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

8

How can we effectivelyleverage these projects?

Big Code:large number of readily available projects

Page 9: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

9

Aim: multi-step specification inference- Bootstrap using consensus- Propagate using similarity- Last mile using decomposition

Consensus DecompositionSimilarity

Page 10: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

10

Challenge:

separate project specific constraints (chaff)mixed with API-specific preconditions

(wheat)

Consensus DecompositionSimilarity

Page 11: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

11

Our initial work1:

use consensus acrosslarge number of projects

to infer API-specific preconditions

Consensus DecompositionSimilarity

[1] FSE’14 – H. Nguyen, Dyer, T. Nguyen, Rajan

Page 12: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

12

Key Ideas

Preconditions can be mined from guarded conditions at the call sites of the code using the APIs

Preconditions mined from multiple projects in a large-scale code corpus can be used to filter out chaff

Consensus DecompositionSimilarity

void m(…) { … if (pred) lib.api(); …}

Page 13: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

13

Challenge:

not all APIs are widely used

Consensus DecompositionSimilarity

Page 14: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

14

Key Ideas

Similar code should havesimilar specifications

Cluster pairs of (code, spec)from previous phase to findsimilar code/specs

Consensus DecompositionSimilarity

Page 15: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

15

Challenge:

unique API code would still be un-specified

Consensus DecompositionSimilarity

Page 16: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

16Consensus DecompositionSimilarity

Key Ideas

Revision history and slicing can help decompose code into fragments

Simple fragments may already be specified by prior phases – compose new specification from those pieces

Page 17: Inferring Behavioral Specifications from Large-scale Repositories by Leveraging Collective Intelligence Robert Dyer Bowling Green State University Tien.

17

Lack of specifications,a critical hurdle for high assurance SE,

can be overcome by leveraging big code mining,esp. consensus, similarity, and decomposition.

boa.cs.iastate.edu