Program Analysis and Transformation

59
Program Analysis and Transformation

description

Program Analysis and Transformation. Program Analysis. Extracting information, in order to present abstractions of, or answer questions about, a software system Static Analysis: Examines the source code Dynamic Analysis: Examines the system as it is executing. What are we looking for?. - PowerPoint PPT Presentation

Transcript of Program Analysis and Transformation

Page 1: Program Analysis and Transformation

Program Analysis and Transformation

Page 2: Program Analysis and Transformation

Apr 19, 2023 COSC6431 2

Program Analysis

• Extracting information, in order to present abstractions of, or answer questions about, a software system

• Static Analysis: Examines the source code

• Dynamic Analysis: Examines the system as it is executing

Page 3: Program Analysis and Transformation

Apr 19, 2023 COSC6431 3

What are we looking for?

• Depends on our goals and the system– In almost any language, we can find out information

about variable usage

– In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc.

– We can also find potential blocks of code that can never be executed in running the program (dead code)

– Typically, the information extracted is in terms of entities and relationships

Page 4: Program Analysis and Transformation

Apr 19, 2023 COSC6431 4

Entities

• Entities are individuals that live in the system, and attributes associated with them.

Some examples:– Classes, along with information about their superclass,

their scope, and ‘where’ in the code they exist.

– Methods/functions and what their return type or parameter list is, etc.

– Variables and what their types are, and whether or not they are static, etc.

Page 5: Program Analysis and Transformation

Apr 19, 2023 COSC6431 5

Relationships

• Relationships are interactions between the entities in the system.

Relationships include:– Classes inheriting from one another.

– Methods in one class calling the methods of another class, and methods within the same class calling one another.

– A method referencing an attribute.

Page 6: Program Analysis and Transformation

Apr 19, 2023 COSC6431 6

Information format

• Many different formats in use• Simple but effective: RSF

inherit TRIANGLE SHAPE• TA is an extension of RSF that includes a schema

$INSTANCE SHAPE Class• GXL is a XML-like extension of TA

Blow-up factor of 10 or more makes it rather cumbersome

Page 7: Program Analysis and Transformation

Apr 19, 2023 COSC6431 7

Static Analysis

• Involves parsing the source code

• Usually creates an Abstract Syntax Tree

• Borrows heavily from compiler technology but stops before code generation

• Requires a grammar for the programming language

• Can be very difficult to get right

Page 8: Program Analysis and Transformation

Apr 19, 2023 COSC6431 8

CppETS

• CppETS is a benchmark for C++ extractors

• It consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering

• Static analysis research tools typically get about 60% of the problems right

Page 9: Program Analysis and Transformation

Apr 19, 2023 COSC6431 9

Example program

#include <iostream.h>class Hello {public: Hello(); ~Hello(); };Hello::Hello(){ cout << "Hello, world.\n"; } Hello::~Hello(){ cout << "Goodbye, cruel world.\n"; }main() {

Hello h;return 0;

}

Page 10: Program Analysis and Transformation

Apr 19, 2023 COSC6431 10

Example Q&A

• How many member methods are in the Hello class?

• Where are these member methods used?

Answer: Two, the constructor (Hello::Hello()) and destructor (Hello::~Hello()).

Answer: The constructor is called implicitly when an instance of the class is created. The destructor is called implicitly when the execution leaves the scope of the instance.

Page 11: Program Analysis and Transformation

Apr 19, 2023 COSC6431 11

Static analysis in IDEs

• High-level languages lend themselves better to static analysis needs– EiffelStudio automatically creates BON

diagrams of the static structure of Eiffel systems

– Rational Rose does the same with UML and Java

• Unfortunately, most legacy systems are not written in either of these languages

Page 12: Program Analysis and Transformation

Apr 19, 2023 COSC6431 12

Static analysis pipeline

Source code Parser Abstract Syntax Tree

Fact base

Fact extractor

Clustering algorithm

Metrics tool

Visualizer

Page 13: Program Analysis and Transformation

Apr 19, 2023 COSC6431 13

Dynamic Analysis

• Provides information about the run-time behaviour of software systems, e.g.– Component interactions– Event traces– Concurrent behaviour– Code coverage– Memory management

• Can be done with a profiler or a debugger

Page 14: Program Analysis and Transformation

Apr 19, 2023 COSC6431 14

Instrumentation

• Augments the subject program with code that transmits events to a monitoring application, or writes relevant information to an output file

• A profiler can be used to examine the output file and extract relevant facts from it

• Instrumentation affects the execution speed and storage space requirements of the system

Page 15: Program Analysis and Transformation

Apr 19, 2023 COSC6431 15

Instrumentation process

Source code Annotator Annotated program

Instrumentedexecutable

CompilerAnnotation

script

Page 16: Program Analysis and Transformation

Apr 19, 2023 COSC6431 16

Dynamic analysis pipeline

Instrumentedexecutable

CPU Dynamic analysis data

Fact base

Profiler

Clustering algorithm

Metrics tool

Visualizer

Page 17: Program Analysis and Transformation

Apr 19, 2023 COSC6431 17

Non-instrumented approach

• One can also use debugger log files to obtain dynamic information

• Disadvantage: Limited amount of information provided

• Advantage: Less intrusive approach, more accurate performance measurements

Page 18: Program Analysis and Transformation

Apr 19, 2023 COSC6431 18

Dynamic analysis issues

• Ensuring good code coverage is a key concern

• A comprehensive test suite is required to ensure that all paths in the code will be exercised

• Results may not generalize to future executions

Page 19: Program Analysis and Transformation

Apr 19, 2023 COSC6431 19

Static vs. Dynamic

• Reasons over all possible behaviours (general results)

• Conservative and sound

• Challenge: Choose good abstractions

• Observes a small number of behaviours (specific results)

• Precise and fast

• Challenge: Select representative test cases

Page 20: Program Analysis and Transformation

Apr 19, 2023 COSC6431 20

SWAGKit

• SWAGKit is used to generate software landscapes from source code

• Based on a pipeline architecture with three phases– Extract (cppx, jfx)– Manipulate (prep, linkplus, layoutplus)– Present (lsedit)

• Currently usable for programs written in C/C++ and Java

Page 21: Program Analysis and Transformation

Apr 19, 2023 COSC6431 21

The SWAGKit Pipeline

layoutpluslinkpluscppx prep lsedit

SourceCode

Landscape

Page 22: Program Analysis and Transformation

Apr 19, 2023 COSC6431 22

The SWAGKit Pipeline

Function Filter Input Output

Extract cppx source .ta

Manipulate prep .ta .o.ta

Linkplus *.o.ta out.ln.ta

Layoutplus out.ln.ta out.ls.ta

Present lsedit out.ls.ta picture

Page 23: Program Analysis and Transformation

Apr 19, 2023 COSC6431 23

cppx & prep

• C/C++ Fact extractor based on gcc (http://swag.uwaterloo.ca/~cppx)

• Extracts facts from one source file at a time

• Facts represent program information as a series of triples– $INSTANCE x integer == x is an integer

– inherit Student Person == Student inherits from Person

– call foo bar == foo calls bar

• Produces .c.ta files, one per source file

• Use –g option for gcc parameters

Page 24: Program Analysis and Transformation

Apr 19, 2023 COSC6431 24

cppx & prep

• Prep is a series of scripts written in Grok

• Function is to “clean up” facts from cppx so they are in a form which can be usable by the rest of the pipeline.

• Produces one .o.ta for each .ta

• Can replace “manual” use of cppx & prep with gce– Edit makefile, replace gcc with gce

– Type make

Page 25: Program Analysis and Transformation

Apr 19, 2023 COSC6431 25

Grok

• A simple scripting language

• A relational algebraic calculator– Powerful in manipulating binary relations– Widely used in architecture transformation

• Online documentation

http://swag.uwaterloo.ca/~nsynytskyy/grokdoc/index.html

Page 26: Program Analysis and Transformation

Apr 19, 2023 COSC6431 26

Grok Features

• Set operations– Union (+), intersection (^), subtraction (-), cross-

product (X)

• Binary relation operations– Union (+), intersection (^), subtraction (-),

composition (o, *), projection (.), domain (dom), range (rng), identity (id), inverse (inv), entity (ent), transitive closure (+), and reflective transitive closure (*)

Page 27: Program Analysis and Transformation

Apr 19, 2023 COSC6431 27

Grok Features Cont.

• Programming constructs– if else

– for, while

• Arithmetic, comparison, logical operators– +, -, *, /, %

– <, <=, ==, >=, >, !=

– !, &&, ||

Page 28: Program Analysis and Transformation

Apr 19, 2023 COSC6431 28

Grok Scripts (1)$ Grok>> cat := {“Garfield”, “Fluffy”}>> mouse := {“Mickey”, “Nancy”}>> cheese := {“Roquefort”, “Swiss”}>> animals := cat + mouse>> food := mouse + cheese>> animalsWhichAreFood := animals ^ food>> animalsWhichAreNotFood := animals – food>> animalsWhichAreFoodMickeyNancy>> animals – foodGarfieldFluffy>> #food4>> mouse <= foodTrue>>

>> chase := cat X mouse

>> chase

Garfield Mickey

Garfield Nancy

Fluffy Mickey

Fluffy Nancy

>>

>> eat := chase + mouse X cheese

>> eat

Garfield Mickey

Garfield Nancy

Fluffy Mickey

Fluffy Nancy

Mickey Roquefort

Mickey Swiss

Nancy Roquefort

Nancy Swiss

Page 29: Program Analysis and Transformation

Apr 19, 2023 COSC6431 29

Grok Scripts (2)

>> {“Mickey”} . eatRoquefortSwiss>> eat . {“Mickey”}GarfieldFluffy>>>> eater := dom eat>> food := rng eat>> chasedBy := inv chase>> topOfFoodChain := dom eat – rng eat>> bottomOfFoodChain := rng eat – dom eat>> bothEatAndChase :=  eat ^ chase>> eatButNotChase := eat – chase>> chaseButNotEat := chase – eat>> secondOrderEat :=  eat  o  eat>> anyOrderEat := eat +

if expression thenstatements

elsestatements

end if

loopstatementsexit when condition

end loop

for variable in setstatements

end for

Page 30: Program Analysis and Transformation

Apr 19, 2023 COSC6431 30

A real example

containFacts := $1getdb containFactsd := dom containr := rng containe := ent containroot := d – rleaves := r – drootChildren := root . containtoKeep := leaves + rootChildrentoDelete := e – toKeepcc := contain+delset toDeletedelrel containcontain := ccrelToFile contain $2

Input: A containment treeOutput: A flattened version of thecontainment tree

Page 31: Program Analysis and Transformation

Apr 19, 2023 COSC6431 31

linkplus

• Function is to “link” all facts into one large graph– Combine graphs from .o.ta files– Resolve inter-compilation unit relationships– Merge header files together– Do some cleanup to shrink final graph

• Usage:– linkplus list_of_files_to_link

• Produces out.ln.ta

Page 32: Program Analysis and Transformation

Apr 19, 2023 COSC6431 32

layoutplus

• Adds– Clustering of facts based on contain.rsf (created manually or

from a clustering algorithm)

– Layout information so that graph can be displayed

– Schema information

• Usage– layoutplus contain_file out.ln.ta

• Produces out.ls.ta

Page 33: Program Analysis and Transformation

Apr 19, 2023 COSC6431 33

lsedit

• View software landscape produced by previous parts of the pipeline

• Can make changes to landscape and save them

• Usage– lsedit out.ls.ta

Page 34: Program Analysis and Transformation

Apr 19, 2023 COSC6431 34

Program Representation

• Fundamental issue in re-engineering– Provides means to generate abstractions– Provides input to a computational model for

analyzing and reasoning about programs– Provides means for translation and

normalization of programs

Page 35: Program Analysis and Transformation

Apr 19, 2023 COSC6431 35

Key questions

• What are the strengths and weaknesses of various representations of programs?

• What levels of abstraction are useful?

Page 36: Program Analysis and Transformation

Apr 19, 2023 COSC6431 36

Abstract Syntax Trees

• A translation of the source text in terms of operands and operators

• Omits superficial details, such as comments, whitespace

• All necessary information to generate further abstractions is maintained

Page 37: Program Analysis and Transformation

Apr 19, 2023 COSC6431 37

AST production

• Four necessary elements to produce an AST:– Lexical analyzer (turn input strings into

tokens)– Grammar (turn tokens into a parse tree)– Domain Model (defines the nodes and arcs

allowable in the AST)– Linker (annotates the AST with global

information, e.g. data types, scoping etc.)

Page 38: Program Analysis and Transformation

Apr 19, 2023 COSC6431 38

AST example

• Input string: 1 + /* two */ 2• Parse Tree:

• AST (withoutglobal info)

21

+

intint

Add

1 2

arg1 arg2

Page 39: Program Analysis and Transformation

Apr 19, 2023 COSC6431 39

Program Transformation

• A program is a structured object with semantics

• Structure allows us to transform a program

• Semantics allow us to compare programs and decide on the validity of transformations

Page 40: Program Analysis and Transformation

Apr 19, 2023 COSC6431 40

Program Transformation

• The act of changing one program into another (from a source language to a target language)

• Used in many areas of software engineering:– Compiler construction

– Software visualization

– Documentation generation

– Automatic software renovation

Page 41: Program Analysis and Transformation

Apr 19, 2023 COSC6431 41

Application examples

• Converting to a new language dialect• Migrating from a procedural language to an

object-oriented one, e.g. C to C++• Adding code comments• Requirement upgrading, e.g. using 4 digits for

years instead of 2 (Y2K)• Structural improvements, e.g. changing GOTOs

to control structures• Pretty printing

Page 42: Program Analysis and Transformation

Apr 19, 2023 COSC6431 42

Simple program transformation

• Modify all arithmetic expressions to reduce the number of parentheses using the formula: (a+b)*c = a*c + b*c

x := (2+5)*3becomesx := 2*3 + 5*3

Page 43: Program Analysis and Transformation

Apr 19, 2023 COSC6431 43

Two types of transformations

• Translation– Source and target language are different– Semantics remain the same

• Rephrasing– Source and target language are the same– Goal is to improve some aspect of the program

such as its understandability or performance– Semantics might change

Page 44: Program Analysis and Transformation

Apr 19, 2023 COSC6431 44

Translation

• Program synthesis– Lowers the level of abstraction, e.g. compilation

• Program migration– Transform to a different language

• Reverse Engineering– Raises the level of abstraction, e.g. create architectural

descriptions from the source code

• Program Analysis– Reduces the program to one aspect, e.g. control flow

Page 45: Program Analysis and Transformation

Apr 19, 2023 COSC6431 45

Translation taxonomy

Page 46: Program Analysis and Transformation

Apr 19, 2023 COSC6431 46

Rephrasing

• Program normalization– Decreases syntactic complexity (desugaring),

e.g. algebraic simplification of expressions

• Program optimization– Improves performance, e.g. inlining, common-

subexpression and dead code elimination

Page 47: Program Analysis and Transformation

Apr 19, 2023 COSC6431 47

Rephrasing

• Program refactoring– Improves the design by restructuring while

preserving the functionality

• Program obfuscation– Deliberately makes the program harder to

understand

• Software renovation– Fixes bugs such as Y2K

Page 48: Program Analysis and Transformation

Apr 19, 2023 COSC6431 48

Transformation tools

• There are many transformation tools

• Program-Transformation.org lists 90 of them

• Most are based on term rewriting

• Other solutions use functional programming, lambda calculus, etc.

Page 49: Program Analysis and Transformation

Apr 19, 2023 COSC6431 49

Term rewriting

• The process of simplifying symbolic expressions (terms) by means of a Rewrite System, i.e. a set of Rewrite Rules.

• A Rewrite Rule is of the formlhs rhswhere lhs and rhs are term patterns

Page 50: Program Analysis and Transformation

Apr 19, 2023 COSC6431 50

Example Rewrite System

0 + x x s(x) + y s(x + y)(x + y) + z x + (y + z)

Under these rewrite rules, the term((s(s(a)) + s(b)) + c)will be rewritten ass(s(s(a + (b + c))))

Page 51: Program Analysis and Transformation

Apr 19, 2023 COSC6431 51

TXL

• A generalized source-to-source translation system

• Uses a context-free grammar to describe the structures to be transformed

• Rule specification uses a by-example style

• Has been used to process billions of lines of code for Y2K purposes

Page 52: Program Analysis and Transformation

Apr 19, 2023 COSC6431 52

TXL programs

• TXL programs consist of two parts:– Grammar for the input language– Transformation Rules

• Let’s look at some examples…

Page 53: Program Analysis and Transformation

Apr 19, 2023 COSC6431 53

Calculator.Txl - Grammar

% Part I. Syntax specification

define program

[expression]

end define

define expression

[term]

| [expression] [addop] [term]

end define

define term

[primary]

| [term] [mulop] [primary]

end define

define primary [number] | ( [expression] )end define define addop '+ | '-end define define mulop '* | '/end define

Page 54: Program Analysis and Transformation

Apr 19, 2023 COSC6431 54

Calculator.Txl - Rules% Part 2. Transformation rulesrule main replace [expression] E [expression] construct NewE [expression] E [resolveAddition] [resolveSubtraction] [resolveMultiplication] [resolveDivision] [resolveParentheses] where not NewE [= E] by NewEend rule

rule resolveAddition replace [expression] N1 [number] + N2 [number] by N1 [+ N2]end rule rule resolveSubtraction …rule resolveMultiplication …rule resolveDivision …rule resolveParentheses replace [primary] ( N [number] ) by Nend rule

Page 55: Program Analysis and Transformation

Apr 19, 2023 COSC6431 55

DotProduct.Txl

% Form the dot product of two vectors,% e.g., (1 2 3).(3 2 1) => 10define program ( [repeat number] ) . ( [repeat number] ) | [number]end define

rule main replace [program] ( V1 [repeat number] ) . ( V2 [repeat number] ) construct Zero [number] 0 by Zero [addDotProduct V1 V2]end rule

rule addDotProduct V1 [repeat number] V2 [repeat number] deconstruct V1 First1 [number]

Rest1 [repeat number] deconstruct V2 First2 [number]

Rest2 [repeat number] construct ProductOfFirsts [number] First1 [* First2] replace [number] N [number] by N [+ ProductOfFirsts]

[addDotProduct Rest1 Rest2]end rule

Page 56: Program Analysis and Transformation

Apr 19, 2023 COSC6431 56

Sort.Txl

% Sort.Txl - simple numeric bubble sortdefine program [repeat number]end definerule main replace [repeat number] N1 [number] N2 [number] Rest [repeat number] where N1 [> N2] by N2 N1 Restend rule

Page 57: Program Analysis and Transformation

Apr 19, 2023 COSC6431 57

Other TXL constructs

compounds -> :=end compoundskeys var procedure exists inout outend keysfunction isAnAssignmentTo X [id] match [statement] X := Y [expression]end function

Page 58: Program Analysis and Transformation

Apr 19, 2023 COSC6431 58

www.txl.ca

• Guided Tour

• Many examples

• Reference manual

• Download TXL for many platforms

Page 59: Program Analysis and Transformation

Apr 19, 2023 COSC6431 59

Example uses

• HTML Pretty Printing of Source Code

• Language to Language Translation

• Design Recovery from Source

• Improvement of security problems

• Program instrumentation and measurement

• Logical formula simplification and interpretation.