Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about...

45
1 Analyzing Software using Deep Learning Introduction Prof. Dr. Michael Pradel Software Lab, TU Darmstadt Subscribe to the course via Piazza: piazza.com/tu-darmstadt.de/summer2018/20000999iv

Transcript of Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about...

Page 1: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

1

Analyzing Software usingDeep Learning

Introduction

Prof. Dr. Michael Pradel

Software Lab, TU Darmstadt

Subscribe to the course via Piazza:piazza.com/tu-darmstadt.de/summer2018/20000999iv

Page 2: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

2

About Me

� Michael Pradel

� At TU Darmstadt since 2014

� Before joining TUDA� Master-level studies in Dresden and Paris� Master thesis at EPFL, Switzerland� PhD at ETH Zurich, Switzerland� Postdoctoral researcher at UC Berkeley, USA

Page 3: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

3

About the Software Lab

� My research group since 2014� Focus: Tools and techniques for

building reliable, efficient, and securesoftware� Program analysis� Test generation

� Thesis and job opportunities

Page 4: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

4

Plan for Today

� Introduction� What the course is about� Why it is interesting� How it can help you

� Organization� Lectures and final exam� Course project

� Basics� Program analysis� Deep learning

Page 5: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

5

What is Program Analysis?

� Automated analysis of programbehavior, e.g., to� find programming errors� optimize performance� find security vulnerabilities

ProgramInput Output

Page 6: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

5

What is Program Analysis?

� Automated analysis of programbehavior, e.g., to� find programming errors� optimize performance� find security vulnerabilities

Program

Additional information

Input Output

Page 7: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

5

What is Program Analysis?

� Automated analysis of programbehavior, e.g., to� find programming errors� optimize performance� find security vulnerabilities

Program

Additional information

InputInput

InputOutputOutput

Output

Page 8: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

6

Why Do We Need It?

Basis for various tools that makedevelopers productive� Compilers� Bug finding tools� Performance profilers� Code completion� Automated testing� Code summarization/documentation

Page 9: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

7

Traditional Approaches

� Analysis has built-in knowledge aboutthe problem to solve

� Significant human effort to create aprogram analysis� Conceptual challenges� Implementation effort

� Analyze a single program at a time

Page 10: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

8

Learning from Existing Data

� Huge amount of existing code(”big code”)

� Programs are regular and repetitive

� Machine learning: Extract knowledgeand apply in new contexts

� Learn how to ..� .. complete partial code� .. use an API� .. fix programming errors� .. create inputs for testing

Page 11: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

9

Deep Learning

Class of machine learning algorithms� Neural network architectures� ”Deep” = multiple layers� Features and representation of inputs are

extracted automatically

Revolutionizes entire areas

Page 12: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

10

This Course

Intersection of program analysis anddeep learning� Some of the basics:

E.g., program representations, neural networkarchitectures

� State of the art research results:Based on recent research papers

� Hands-on experience:Coding project

Page 13: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

11

Not This Course

What this course is not about� Detailed coverage of program analysis� Detailed coverage of machine learning� Programming tutorial for TensorFlow

Check out related courses� E.g., ”Program Testing and Analysis” (winter

semester)

Page 14: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

12

Plan for Today

� Introduction� What the course is about� Why it is interesting� How it can help you

� Organization� Lectures and final exam� Course project

� Basics� Program analysis� Deep learning

Page 15: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

13

Organization

� Weekly meetings� 6 lectures� 6 Q&A sessions for course project

� Reading material

� 2nd half of semester (from May 14):� Course project

� July 9: Submission of project

� Exam period: Written exam

Page 16: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

14

Grading

50% written exam� Content of lectures and reading material� Open book, one hour� Will test your understanding, not your memory

50% course project� Effectiveness of your implementation� Documentation and code quality

Page 17: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

15

Piazza

Platform for discussions, in-classquizzes, and sharing additional material

� Please register and enroll for the class

� Use it for all questions related to the course

� Messages sent to all students go via Piazza(not TUCaN!)

Subscribe to the course via Piazza:piazza.com/tu-darmstadt.de/summer2018/20000999iv

Page 18: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

16

Learning Material

There is no script or single book thatcovers everything

� Slides and hand-written nodes:Available after lecture

� Pointers to papers, book chapters, and webresources

Page 19: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

17

Course Project

� Individual project

� Same task for everybody

� Implement and evaluate a neuralnetwork that predicts/generates code

� Based on existing tools� TensorFlow library for machine learning� Python

� More details on May 14

Page 20: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

18

Plan for Today

� Introduction� What the course is about� Why it is interesting� How it can help you

� Organization� Lectures and final exam� Course project

� Basics� Program analysis� Deep learning

Page 21: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

19

Program Analysis

Many ways to represent (parts of) aprogram� Sequence of characters

� Sequence of tokens

� Abstract syntax tree

� Control flow graph

� Call graph

� etc.

Page 22: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

19

Program Analysis

Many ways to represent (parts of) aprogram� Sequence of characters

� Sequence of tokens

� Abstract syntax tree

� Control flow graph

� Call graph

� etc.

Page 23: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

20

Tokens

Tokenizer (or lexer)� Part of compiler� Splits sequence of characters into subsequences

called tokens

E.g., for Java, six kinds of tokens:� Identifiers, e.g., MyClass� Keywords, e.g., if� Separators, e.g., . or {� Operators, e.g., * or ++� Literals, e.g., 23 or "hi"� Comments, e.g., /* bla */

Page 24: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

1

Page 25: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

22

Abstract Syntax Tree

� Tree representation of source code

� ”Abstract” because some details ofsyntax omitted� E.g., { in Java

� Nodes: Construct in source code

� Edges: Parent-child relationship

� Check out Esprima for obtaining ASTsof Javascript:http://esprima.org/demo

Page 26: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

2

Page 27: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

24

Deep Learning: Example

Example: Handwriting recognition� Goal: Recognize digits 0..9

� Easy for a human but challenging for a computer

� Idea: Learn from a large number of trainingexamples

� Deep learning: > 99% accuracy

Following slides based on Chapter 1 ofneuralnetworksanddeeplearning.com

Page 28: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

3

Page 29: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

4

Page 30: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

5

Page 31: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

6

Page 32: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

29

Universal Computation

� Networks of NAND perceptrons cansimulate every circuit containing onlyNAND gates

� Can express arbitrary computations!

Page 33: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

30

Example: Adding Two Bits

NAND gate:

Network of perceptrons:

Page 34: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

31

Challenge: Set Weights and Biases

� More complex networks can perform arbitrarycomputations

� How to decide on the weights and biases?

� Option 1: Hand-tune them→ Infeasible for complex networks

� Option 2: Learn them→ Key idea behind machine learning with neuralnetworks

Page 35: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

7

Page 36: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

8

Page 37: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

9

Page 38: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

10

Page 39: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

11

Page 40: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

12

Page 41: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

39

Quiz: Cost Function

� Recognition of hand-written digits

� Only digits 0, 1, and 2

� Training examples:

Example Actual Desired

1 (0, 1, 0)T (0.5, 0.5, 0)T

2 (1, 0, 0)T (1, 0, 0)T

� What is the value of the cost function?

Page 42: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

13

Page 43: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

40

Goal: Minimize Cost Function

� Goal of learning: Find weights andbiases that minimize the cost function

� Approach: Gradient descent� Compute gradient of C: Vector of partial

derivatives

� ”Move” closer towardminimum step-by-step

� Learning ratedetermines step size

Page 44: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

41

Training Examples

� Effort of computing gradient dependson number of examples

� Stochastic gradient descent� Use small sample of all examples� Compute estimate of true gradient

� Epochs and mini-batches� Split training examples into k mini-batches� Train network with each mini-batch� Epoch: Each mini-batch used exactly once

Page 45: Introduction Deep Learning Analyzing Software using€¦ · Introduction What the course is about Why it is interesting How it can help you ... Detailed coverage of machine learning

42

Plan for Today

� Introduction� What the course is about� Why it is interesting� How it can help you

� Organization� Lectures and final exam� Course project

� Basics� Program analysis� Deep learning