Machine Learning in Software Engineering

73
NEW TRENDS IN LEARNING FOR SOFTWARE ENGINEERING Alaa Hamouda Department of Computer Engineering, Engineering Faculty, Al-Azhar University, Egypt 1

Transcript of Machine Learning in Software Engineering

Page 1: Machine Learning in Software Engineering

NEW TRENDS IN LEARNING FOR SOFTWARE ENGINEERING

Alaa HamoudaDepartment of Computer Engineering,

Engineering Faculty, Al-Azhar University, Egypt

1

Page 2: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

2

Page 3: Machine Learning in Software Engineering

Problem Definition

• There is a need to meet the challenge of developing and maintaining large and complex software systems.

• Machine learning methods have been playing an increasingly important role in many software development and maintenance tasks.

3

Page 4: Machine Learning in Software Engineering

SWE Phases

4

Page 5: Machine Learning in Software Engineering

Overview of ML

• Machine learning methods fall into the following broad categories: supervised learning and unsupervised learning. Supervised learning deals with learning a target function from labeled examples. Unsupervised learning attempts to learn patterns and associations from a set of objects that do not have attached class labels.

• Supervised learning can be divided into eager and lazy classifiers

5

Page 6: Machine Learning in Software Engineering

Overview of ML

6

Page 7: Machine Learning in Software Engineering

Overview of ML

7

Page 8: Machine Learning in Software Engineering

8

The loan data (reproduced)Approved or not

Page 9: Machine Learning in Software Engineering

9

A decision tree from the loan dataDecision nodes and leaf nodes (classes)

Page 10: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

10

Page 11: Machine Learning in Software Engineering

Project Planning

• The statistics report failure rate of 70% for the software

• The cost overrun has been indicated 189%

• The researches show that inaccurate estimation is the root factor of fail in the most software project fails.

11

Page 12: Machine Learning in Software Engineering

Size Estimation

• Size -- Effort - Cost

• twenty-eight out of the collected sixty publications (almost 47%) deal with the issue of how to build models to predict or estimate certain property of software development process or artifacts.

12

Page 13: Machine Learning in Software Engineering

Function Point

13

Internal Logical File: File accessed and maintained by the application under developmentExternal Interface File: File accessed by the Processing Logic, but maintained by another applicationExternal Input: An elementary process that processes data that comes from outside the application boundary.–Maintains ILF

External Output: An elementary process that sends data outside the application boundary.-EO represents information to user through processing logic in addition to retrieval of data

External in Query: An elementary process that sends data outside the application boundary-EQ presents information to a user through retrieval of data from ILF/EIF.-No data manipulation or processing logic.

Page 14: Machine Learning in Software Engineering

Size estimation (Cont’)

Input:• Function points• Project domains• Number of components types:

– Number of menu components– Number of inputs components– Number of output components

ML Algorithm:• Neural NetworkOutput:• LOC to be fed to the cost estimation stage

14

Page 15: Machine Learning in Software Engineering

Size estimation (Cont’)

15

Page 16: Machine Learning in Software Engineering

Effort Estimation

Input:• Line of Code (generated from the size estimation)• Scale factors• Cost Drivers

Algorithm:• Fuzzy Inference Engine

Output:• Estimated efforts (e.g. man-hours)

16

Page 17: Machine Learning in Software Engineering

Inputs (scale factors)

Factor Explanation

Precedentedness

(PREC)

Reflects the previous experience of the

organization

Development

Flexibility (FLEX)

Reflects the degree of flexibility in the

development process.

Risk Resolution (RESL) Reflects the extent of risk analysis carried out.

Team Cohesion (TEAM) Reflects how well the development team knows

each other and work together.

Process maturity (PMAT) Reflects the process maturity of the organization.

17

Factor Explanation LOC Line of Code

Page 18: Machine Learning in Software Engineering

Inputs (Cost Drivers) Attribute Type Description

RELY Product Required system reliability

CPLX Product Complexity of system modules

DOCU Product Extent of documentation required

DATA Product Size of database used

RUSE Product Required percentage of reusable components

TIME Computer Execution time constraint

PVOL Computer Volatility of development platform

STOR Computer Memory constraints

ACAP Personnel Capability of project analysts

PCON Personnel Personnel continuity

PCAP Personnel Programmer capability

PEXP Personnel Programmer experience in project domain

AEXP Personnel Analyst experience in project domain

LTEX Personnel Language and tool experience

TOOL Project Use of software tools

SCED Project Development schedule compression

SITE Project Extent of multisite working and quality of inter-

site communications 18

Page 19: Machine Learning in Software Engineering

Using Fuzzy Logic

19

Page 20: Machine Learning in Software Engineering

Effort Estimation directly from UCP

In the previous method:

• FP (size) -- > LOC (size) -- > Effort

Another method:

• UCP (size) -- > Effort (directly)

20

Page 21: Machine Learning in Software Engineering

Effort Estimation

21

Page 22: Machine Learning in Software Engineering

Use Case Point Calculation

22

Page 23: Machine Learning in Software Engineering

Productivity

23

Page 24: Machine Learning in Software Engineering

Project Complexity

• Level 1: the project team is familiar with this type of project and the team has developed similar projects in the past. The number and type of interfaces are simple. The project will be installed in normal conditions where high security or safety factors are not required. Moreover, Level 1 projects are those of which around 20% of their design or implementation parts are reused (came from old similar projects).

• Level 2: This is similar to level1 category with a difference that only about 10% of these projects are reused.

24

Page 25: Machine Learning in Software Engineering

Project Complexity (Cont’d)

• Level 3: the technology, interface, installation conditions are normal. Furthermore, no parts of the projects had been previously designed or implemented.

• Level 4: the project is required to be installed on a complicated topology/architecture such as distributed systems. Moreover, in this level, the number of variables and interface is large.

• Level 5: This is similar to Level4 but with additional constraints such as a special type of security or high safety factors.

25

Page 26: Machine Learning in Software Engineering

Effort Estimation

26

Page 27: Machine Learning in Software Engineering

Effort Estimation (Cont’d)

The results show that the proposed ANN model outperforms:

• Regression models by 8%

• UCP models by 50%

27

Page 28: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

28

Page 29: Machine Learning in Software Engineering

Requirements Analysis

29

Business Analysis System Analysis

Page 30: Machine Learning in Software Engineering

Requirements Analysis

30

Page 31: Machine Learning in Software Engineering

Requirements Analysis

31

Page 32: Machine Learning in Software Engineering

Requirements Analysis

Lexicons Phase-I Phase –II

User Noun Actor

fills Verb Action

the Article -------

form Noun Object

32

Page 33: Machine Learning in Software Engineering

Requirements

• Reverse engineering where we have legacy systemsthat are critical to the operation of an organizationwhich uses them and that must still be maintained.

• Most legacy systems were developed before softwareengineering techniques were widely used. Thus theymay be poorly structured and their documentationmay be either out-of-date or non-existent.

• In order to bring to bear the legacy systemmaintenance, the first task is to recover the design orspecification of a legacy system from its source orexecutable code

33

Page 34: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

34

Page 35: Machine Learning in Software Engineering

Design

1. Finding Fault Prone components for reuse

2. UI Design

35

Page 36: Machine Learning in Software Engineering

Components Re-use

• Software quality classification models can be used to indicate which program modules are fault-prone (FP) and not fault-prone (NFP).

• These models can be used to select the best candidate modules.

36

Page 37: Machine Learning in Software Engineering

Components Re-use

AttributeU_1 Number of unique operators

N_1 Total number of operators

U_2 Number of unique operands

N_2 Total number of operands

V(G) McCabe’s cyclomatic complexity

N_L Number of logical operators

LOC Lines of code

ELOC Executable lines of code37

Page 38: Machine Learning in Software Engineering

User Interface Design

• Learnability is an important aspect of usability

• users lose up to 40% of their time due to “frustrating experiences” with computers, with one of the most common causes of these frustrations being missing, hard to find, and unusable features of the software.

38

Page 39: Machine Learning in Software Engineering

User Interface Design

• Nielsen defines that a highly learnable system could be categorized as “allowing users to reach a reasonable level of usage proficiency within a short time”.

• Web usage map is mined through Label Sequential Rule

39

Page 40: Machine Learning in Software Engineering

User Interface Design

40

Page 41: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

41

Page 42: Machine Learning in Software Engineering

Implementation

• Implementation is a core process in the software engineering life cycle.

• One of the challenges in this phase is the modularization –or remodularization-.

• Genetic algorithms have been successfully used to address this problem.

• The objective is to improve the module quality (MQ). All versions of MQ are combinations of cohesion and coupling into a single weighted fitness function.

42

Page 43: Machine Learning in Software Engineering

Implementation (Cont’d)

• Clustering has also been applied to package coupling, to reduce overall package size and to explore the relationship between design and code level software structure.

• Additional objectives might include closeness to original module structure, business goals, technical constraints, testability, and other metrics that may be important in finding a good module structure.

43

Page 44: Machine Learning in Software Engineering

Implementation (Cont’d)

44

Page 45: Machine Learning in Software Engineering

Implementation (Cont’d)

• Refactoring is to rewrite existing source code in order to improve its readability, reusability or structure without affecting its meaning or behavior.

• For project managers it is interesting to know which locations are likely to demand refactoring. Refactoring improves the understandability of the code, but on the other hand requires development time

45

Page 46: Machine Learning in Software Engineering

Implementation (Cont’d)

• Researches screen evolution data from versioning systems of open source projects.

• ArgoUML and the Spring framework are examples developed in Java and consist of 5000 and 10000 classes each.

• Each class is usually placed in a separate file in Java, thus they use files equivalent to classes and focus on files for our analysis.

46

Page 47: Machine Learning in Software Engineering

Implementation (Cont’d)

The used features can be divided into different categories:

• Size

This category contains size measures such as lines of code from an evolution perspective: linesAdded, linesModified, or linesDeletedrelative to the total LOC (lines of code) of a file.

47

Page 48: Machine Learning in Software Engineering

Implementation (Cont’d)• Team

The number of authors of files influences the way software is developed. It is expected that the more authors are working on the changes the higher the probability of rework and mistakes.

• Complexity of existing solution

According to the laws of software evolution, software continuously becomes more and more complex. Changes are more difficult to add as the software is more difficult to understand and the contracts between existing parts have to retain. As a result they investigate the changeCount in relation to the number of changes during the entire history of each file.

48

Page 49: Machine Learning in Software Engineering

Implementation (Cont’d)

• New Requirements

In software development projects usually new classes are added to object-oriented systems when new requirements have to be satisfied. They use the information whether a file was newly introduced during the prediction period

• Relational Aspects

One of the most important features of this category are couplings such as the number of changes/revisions where other files have been committed with.

49

Page 50: Machine Learning in Software Engineering

Implementation (Cont’d)

• With the described features, the number of refactorings is predicted

50

Page 51: Machine Learning in Software Engineering

Implementation (Cont’d)

• Decision tree and neural network are used as classifiers.

• The F-measure was about 65%. • It is clear that several features such as lines

activity rate and number of lines altered per commit provide much information for the assessment of refactorings.

• But also the structure of the system is crucial for refactorings, as the number of co-changed files and the number of files introduced during the maintenance are relevant features.

51

Page 52: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

52

Page 53: Machine Learning in Software Engineering

Testing• Software quality models help ensure the

reliability of the delivered products.

• Early detection of fault-prone software components enables verification experts to concentrate their time and resources on the problem areas of the software system under development.

• Accurate prediction of fault-prone modules enables the verification and validation activities focused on the critical software components.

53

Page 54: Machine Learning in Software Engineering

Testing (Cont’d)

54

Page 55: Machine Learning in Software Engineering

Testing (Cont’d)

• Decision trees correctly predicted 79.3% of high development effort fault-prone modules (detection rate), while the trees generated from the best parameter combinations correctly identified 88.4% of those modules on the average.

55

Page 56: Machine Learning in Software Engineering

Agenda

• Introduction

• Software Engineering Phases

• Machine Learning Overview

• Applications of ML in SWE with each process:– Project Planning

– Requirements

– Design

– Implementation

– Testing

– Maintenance

• Conclusion

56

Page 57: Machine Learning in Software Engineering

Maintenance

• Software maintenance is widely recognized to be the most expensive and time-consuming aspect of the software process.

• A relevance relation maps a tuple of system elements to a value indicating how related they are.

• These software change repositories reflect a history of the system, which includes actions that result in the creation of new relationships and the strengthening of the existing relationships in the software.

57

Page 58: Machine Learning in Software Engineering

Maintenance (Cont’d)

58

Page 59: Machine Learning in Software Engineering

Maintenance (Cont’d)

• Software entities include documents, source files, routines, modules, variables, and even the entire software system.

• A relevance relation is a predictor that maps tuples of two or more software entities to a value r quantifying how relevant, that is, connected or related, the entities are to each other.

• r shows the strength of relevance among the entities.

59

Page 60: Machine Learning in Software Engineering

Maintenance (Cont’d)

60

Page 61: Machine Learning in Software Engineering

Maintenance Effort Prediction• If the predictions are based on formal software

development effort prediction models, such as the estimation part of the Function Point Analysis, essential differences in characteristics between software development and software maintenance are neglected

• The focus of software development is the creation of software, but the focus of software maintenance is more the change of software.

• The development of a software application typically is a one-of-a-kind project, but the maintenance activities on an application usually comprise a large number of tasks carried out over a long period of time in a relatively stable environment.

62

Page 62: Machine Learning in Software Engineering

Maintenance Effort Prediction

• Some researches collected data on:

– 109 randomly selected maintenance tasks

– 70 applications

– The size of the applications varied from a few thousand lines of code (LOC) to about 500,000 LOC

– the age of the applications varied from less than a year to more than 20 years

– The functions of the applications included payroll, order entry, billing and invoicing, inventory control, service management, and personnel administration.

63

Page 63: Machine Learning in Software Engineering

Maintenance Effort Prediction

The following data was collected for each maintenance task:

• Type of maintenance task, i.e., corrective or perfective.

• Priority of task, i.e., high, medium or low priority.

• Maintainer’s knowledge and confidence about how to solve the task immediately after having read or heard the task specification.

• Years of experience as maintainer, and on the maintained application.

• Education level of the maintainer.

• Work-hours (effort) spent on the task.

• Task size and the programming language

• Age and size of the changed application. 64

Page 64: Machine Learning in Software Engineering

Maintenance Effort Prediction

Most Important features:

• Cause: Corrective maintenance = 0, otherwise = 1 • Change: More than 50% of the effort is believed to be

spent on updating of code compared to inserting and deleting the code = 0, otherwise = l

• Mode: More than 50% of the effort is believed to be spent on development of new modules (New module mode) = 0, otherwise (Embedded mode) = 1

• Confidence: The maintainer believes he knows how to solve the task when the task specification is read/heard the first time = 0 (High confidence), otherwise = 1 (Medium or low confidence).

65

Page 65: Machine Learning in Software Engineering

Maintenance Effort Prediction

Less effect features:

• Type of language

• Maintainer experience

• Task priority

• Application age

• Application size

66

Page 66: Machine Learning in Software Engineering

Maintenance Effort Prediction

• Neural network and regression were used as approaches for effort prediction.

• The prediction accuracy was acceptable (error of 60%).

• A recommended use of an effort prediction model is, therefore, to support the expert predictions.

• Another important use of a formal prediction model may be to support the collection and analysis of maintenance data in order to enable improvement of the maintenance process and product.

67

Page 67: Machine Learning in Software Engineering

Open Problems

• Most of presented work are immature and a lot of related issues are still open.

• Machine learning can help in the requirements engineering phase in developing knowledge based systems and ontologies to manage the requirements and model problem domains

68

Page 68: Machine Learning in Software Engineering

Open Problems (Cont’d)

• One of the most difficult problems is the problem of transforming requirements into architectures. Much research is needed in this area to address the ever increasing complexity of functional and non-functional requirements.

69

Page 69: Machine Learning in Software Engineering

Open Problems (Cont’d)

• One area that has received some attention is the use of automated algorithms with machine learning to make repair assignments.

• In any case, more studies with respect to the appropriate criteria for selecting assignment policy, reward mechanisms and management goals need to be undertaken.

70

Page 70: Machine Learning in Software Engineering

Conclusion

• The existing work certainly proves that the field of software engineering is a fertile ground for the application of machine learning methods.

• It is clear that there is an increased interest in the niche area of machine learning and software engineering.

71

Page 71: Machine Learning in Software Engineering

Conclusion (cont’d)

• The strength of machine learning methods lies in the fact that they have sound mathematical and logical justifications

• The power of machine learning methods does not come from a particular induction method, but instead from proper formulation of the problems and from crafting the representation to make learning tractable.

72

Page 72: Machine Learning in Software Engineering

Conclusion (cont’d)

• Machine learning can play a good role in the different phases of software engineering; project planning, requirements analysis, design, implementation, testing, and even in maintenance

• It is expected that this interest in applying machine learning in software engineering tasks will increase significantly especially with the increase interest in the empirical software engineering.

73

Page 73: Machine Learning in Software Engineering

Thank you very much

74