Learning from other's mistakes: Data-driven code analysis

Data-driven code analysis: Learning from other's mistakes

Andreas Dewes (@japh44)

andreas@quantifiedcode.com

13.04.2015

PyCon 2015 – Montreal

Physicist and Python enthusiast

CTO of a spin-off of the

University of Munich (LMU):

We develop software for data-driven code analysis.

Our mission

Tools & Techniques for Ensuring Code Quality

static dynamic

automated

manual

Debugging

Profiling

Manual

code reviews

Static analysis /

automated

code reviews

Unit testing

System testing

Integration testing

Discovering problems in code

def encode(obj): """Encode a (possibly nested) dictionary containing complex valuesinto a form that can be serializedusing JSON."""e = {} for key,value in obj: if isinstance(value,dict): e[key] = encode(value)

elif isinstance(value,complex): e[key] = {'type' : 'complex',

'r' : value.real, 'i' : value.imaginary}

return e

d = {'a' : 1j+4,'s' : {'d' : 4+5j}} print encode(d)

obj returns only thekeys of the dictionary.(obj.items() is needed)

value.imaginary does not exist. (value.imag would be correct)

Dynamic Analysis (e.g. unit testing)

return e

def test_encode(): d = {'a' : 1j+4,

's' : {'d' : 4+5j}}

r = encode(d) #this will fail...

assert r['a'] == {'type' : 'complex', 'r' : 4,'i' : 1}

assert r['s']['d'] == {'type' : 'complex', 'r' : 4,'i' : 5}

Static Analysis (for humans)

encode is a function with 1 parameterwhich always returns a dict.

I: obj should be an iterator/list of tupleswith two elements.

encode gets called with adict, which does not satisfy (I).

a value of type complex does nothave an .imaginary attribute!

encode is called with a dict, whichagain does not satisfy (I).

return e

How static analysis tools works (short version)

1. Compile the code into a data

structure, typically an abstract syntax

tree (AST)

2. (Optionally) annotate it with

additional information to make

analysis easier

3. Parse the (AST) data to find problems.

Python Tools for Static Analysis

PyLint (most comprehensive tool)http://www.pylint.org/

PyFlakes (smaller, less verbose)https://pypi.python.org/pypi/pyflakes

Pep8 (style and some structural checks)https://pypi.python.org/pypi/pep8

(... and many others)

Limitations of current tools & technologies

Checks are hard to create / modify...(example: PyLint code for analyzing 'try/except' statements)

Long feedback cycles

Rethinking code analysis for Python

Our approach

1. Code is data! Let's not keep it in text

files but store it in a useful form that we

can work with easily (e.g. a graph).

2. Make it super-easy to specify errors

and bad code patterns.

3. Make it possible to learn from user

feedback and publicly available code.

Building the Code Graph

return e

nameassign

functiondef

targets

body iterator

return e

{i : 1}

{id : 'e'}

{name: 'encode',args : [...]}

return e

e4fa76b...

a76fbc41...

c51fa291...

74af219...

nameassign

targets

body iterator

functiondef

$type: dict

Example: Tornado Project

10 modules from the tornado project

Modules

Classes

Functions

Advantages

- Simple detection of (exact) duplicates

- Semantic diffing of modules, classes, functions, ...

- Semantic code search on the whole tree

Describing Code Errors / Anti-Patterns

Code issues = patterns on the graph

return e

attribute

{id : imaginary}

$type {id : value}

complex

Using YAML to describe graph patterns

return e

node_type: attribute

value:

$type: complex

attr: imaginary

Generalizing patterns

return e

node_type: attribute

value:

$type: complex

$or: [real, imagin]

Learning from feedback / false positives

"else" in for loop without break statement

node_type: for

$anywhere:

node_type: break

orelse:

$anything: {}

values = ["foo", "bar", ... ]

for i,value in enumerate(values): if value == 'baz': print "Found it!"

else: print "didn't find 'baz'!"

Learning from false positives (I)

for i,value in enumerate(values): if value == 'baz': print "Found it!"return value

node_type: for

- $anywhere:

node_type: break

- $anywhere:

node_type: return

orelse:

$anything: {}

Learning from false positives (II)

node_type: for

- $anywhere:

node_type: break

exclude:

node_type:

$or: [while,for]

- $anywhere:

node_type: return

orelse:

$anything: {}

for i,value in enumerate(values): if value == 'baz': print "Found it!"for j in ...:

#...break

patterns vs. code

handlers:node_type: excepthandlertype: null

node_type: tryexcept

handlers:- body:

- node_type: passnode_type: excepthandler

node_type: tryexcept

(no exception type specified)

(empty exception handler)

Summary & Feedback

1. Storing code as a graph opens up many

interesting possibilities. Let's stop thinking of

code as text!

2. We can learn from user feedback or even

use machine learning to create and adapt

code patterns!

3. Everyone can write code checkers!

=> crowd-source code quality!

Thanks!

www.quantifiedcode.comhttps://github.com/quantifiedcode

@quantifiedcode

Andreas Dewes (@japh44)

andreas@quantifiedcode.com

Visit us at booth 629!

Learning from other's mistakes: Data-driven code analysis

Technology

Transcript of Learning from other's mistakes: Data-driven code analysis

Inference of other's internal neural models from active observation

Data-Driven Intervention: Correcting Mathematics Students ... · PDF fileData-Driven Intervention: Correcting Mathematics Students’ Misconceptions, not Mistakes ... through HeyMath!

Common Mistakes Guide - LAUNCHlaunch.tamu.edu/.../docs/...Common-Mistakes-Guide.pdf · mistakes using the Common Mistakes Guide before you submit. Failure to revise your document

Learning Over Each Other's Shoulders

Observing other's circumstances

The Other's Rights by J-F. Lyotard

BALDAUF 1993, Animals and Fungi Are Each Other's Closest Relatives

Testing mistakes

Stupid mistakes

Mistakes december

10 Mistakes

How to Avoid Common Mistakes Puppy Training Mistakes

Common mistakes

Two Mistakes

Funny Mistakes

Testimony Mistakes

Contracts & Mistakes

Making Power Point Presentations Lecture 2 Lecture Content Common mistakes Common mistakes Examples of Mistakes Examples of Mistakes Tips for Improving.

Walking in each other's shoes LLP Link 2013-1-DE3-COM06-36083 2 Walking in each other's shoes Comenius Multilateral Partnership.

Giant Compassion: the wish to remove other's suffering