Machine Learning in Software Engineering
-
Upload
alaa-hamouda -
Category
Engineering
-
view
430 -
download
3
Transcript of Machine Learning in Software Engineering
NEW TRENDS IN LEARNING FOR SOFTWARE ENGINEERING
Alaa HamoudaDepartment of Computer Engineering,
Engineering Faculty, Al-Azhar University, Egypt
1
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
2
Problem Definition
• There is a need to meet the challenge of developing and maintaining large and complex software systems.
• Machine learning methods have been playing an increasingly important role in many software development and maintenance tasks.
3
SWE Phases
4
Overview of ML
• Machine learning methods fall into the following broad categories: supervised learning and unsupervised learning. Supervised learning deals with learning a target function from labeled examples. Unsupervised learning attempts to learn patterns and associations from a set of objects that do not have attached class labels.
• Supervised learning can be divided into eager and lazy classifiers
5
Overview of ML
6
Overview of ML
7
8
The loan data (reproduced)Approved or not
9
A decision tree from the loan dataDecision nodes and leaf nodes (classes)
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
10
Project Planning
• The statistics report failure rate of 70% for the software
• The cost overrun has been indicated 189%
• The researches show that inaccurate estimation is the root factor of fail in the most software project fails.
11
Size Estimation
• Size -- Effort - Cost
• twenty-eight out of the collected sixty publications (almost 47%) deal with the issue of how to build models to predict or estimate certain property of software development process or artifacts.
12
Function Point
13
Internal Logical File: File accessed and maintained by the application under developmentExternal Interface File: File accessed by the Processing Logic, but maintained by another applicationExternal Input: An elementary process that processes data that comes from outside the application boundary.–Maintains ILF
External Output: An elementary process that sends data outside the application boundary.-EO represents information to user through processing logic in addition to retrieval of data
External in Query: An elementary process that sends data outside the application boundary-EQ presents information to a user through retrieval of data from ILF/EIF.-No data manipulation or processing logic.
Size estimation (Cont’)
Input:• Function points• Project domains• Number of components types:
– Number of menu components– Number of inputs components– Number of output components
ML Algorithm:• Neural NetworkOutput:• LOC to be fed to the cost estimation stage
14
Size estimation (Cont’)
15
Effort Estimation
Input:• Line of Code (generated from the size estimation)• Scale factors• Cost Drivers
Algorithm:• Fuzzy Inference Engine
Output:• Estimated efforts (e.g. man-hours)
16
Inputs (scale factors)
Factor Explanation
Precedentedness
(PREC)
Reflects the previous experience of the
organization
Development
Flexibility (FLEX)
Reflects the degree of flexibility in the
development process.
Risk Resolution (RESL) Reflects the extent of risk analysis carried out.
Team Cohesion (TEAM) Reflects how well the development team knows
each other and work together.
Process maturity (PMAT) Reflects the process maturity of the organization.
17
Factor Explanation LOC Line of Code
Inputs (Cost Drivers) Attribute Type Description
RELY Product Required system reliability
CPLX Product Complexity of system modules
DOCU Product Extent of documentation required
DATA Product Size of database used
RUSE Product Required percentage of reusable components
TIME Computer Execution time constraint
PVOL Computer Volatility of development platform
STOR Computer Memory constraints
ACAP Personnel Capability of project analysts
PCON Personnel Personnel continuity
PCAP Personnel Programmer capability
PEXP Personnel Programmer experience in project domain
AEXP Personnel Analyst experience in project domain
LTEX Personnel Language and tool experience
TOOL Project Use of software tools
SCED Project Development schedule compression
SITE Project Extent of multisite working and quality of inter-
site communications 18
Using Fuzzy Logic
19
Effort Estimation directly from UCP
In the previous method:
• FP (size) -- > LOC (size) -- > Effort
Another method:
• UCP (size) -- > Effort (directly)
20
Effort Estimation
21
Use Case Point Calculation
22
Productivity
23
Project Complexity
• Level 1: the project team is familiar with this type of project and the team has developed similar projects in the past. The number and type of interfaces are simple. The project will be installed in normal conditions where high security or safety factors are not required. Moreover, Level 1 projects are those of which around 20% of their design or implementation parts are reused (came from old similar projects).
• Level 2: This is similar to level1 category with a difference that only about 10% of these projects are reused.
24
Project Complexity (Cont’d)
• Level 3: the technology, interface, installation conditions are normal. Furthermore, no parts of the projects had been previously designed or implemented.
• Level 4: the project is required to be installed on a complicated topology/architecture such as distributed systems. Moreover, in this level, the number of variables and interface is large.
• Level 5: This is similar to Level4 but with additional constraints such as a special type of security or high safety factors.
25
Effort Estimation
26
Effort Estimation (Cont’d)
The results show that the proposed ANN model outperforms:
• Regression models by 8%
• UCP models by 50%
27
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
28
Requirements Analysis
29
Business Analysis System Analysis
Requirements Analysis
30
Requirements Analysis
31
Requirements Analysis
Lexicons Phase-I Phase –II
User Noun Actor
fills Verb Action
the Article -------
form Noun Object
32
Requirements
• Reverse engineering where we have legacy systemsthat are critical to the operation of an organizationwhich uses them and that must still be maintained.
• Most legacy systems were developed before softwareengineering techniques were widely used. Thus theymay be poorly structured and their documentationmay be either out-of-date or non-existent.
• In order to bring to bear the legacy systemmaintenance, the first task is to recover the design orspecification of a legacy system from its source orexecutable code
33
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
34
Design
1. Finding Fault Prone components for reuse
2. UI Design
35
Components Re-use
• Software quality classification models can be used to indicate which program modules are fault-prone (FP) and not fault-prone (NFP).
• These models can be used to select the best candidate modules.
36
Components Re-use
AttributeU_1 Number of unique operators
N_1 Total number of operators
U_2 Number of unique operands
N_2 Total number of operands
V(G) McCabe’s cyclomatic complexity
N_L Number of logical operators
LOC Lines of code
ELOC Executable lines of code37
User Interface Design
• Learnability is an important aspect of usability
• users lose up to 40% of their time due to “frustrating experiences” with computers, with one of the most common causes of these frustrations being missing, hard to find, and unusable features of the software.
38
User Interface Design
• Nielsen defines that a highly learnable system could be categorized as “allowing users to reach a reasonable level of usage proficiency within a short time”.
• Web usage map is mined through Label Sequential Rule
39
User Interface Design
40
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
41
Implementation
• Implementation is a core process in the software engineering life cycle.
• One of the challenges in this phase is the modularization –or remodularization-.
• Genetic algorithms have been successfully used to address this problem.
• The objective is to improve the module quality (MQ). All versions of MQ are combinations of cohesion and coupling into a single weighted fitness function.
42
Implementation (Cont’d)
• Clustering has also been applied to package coupling, to reduce overall package size and to explore the relationship between design and code level software structure.
• Additional objectives might include closeness to original module structure, business goals, technical constraints, testability, and other metrics that may be important in finding a good module structure.
43
Implementation (Cont’d)
44
Implementation (Cont’d)
• Refactoring is to rewrite existing source code in order to improve its readability, reusability or structure without affecting its meaning or behavior.
• For project managers it is interesting to know which locations are likely to demand refactoring. Refactoring improves the understandability of the code, but on the other hand requires development time
45
Implementation (Cont’d)
• Researches screen evolution data from versioning systems of open source projects.
• ArgoUML and the Spring framework are examples developed in Java and consist of 5000 and 10000 classes each.
• Each class is usually placed in a separate file in Java, thus they use files equivalent to classes and focus on files for our analysis.
46
Implementation (Cont’d)
The used features can be divided into different categories:
• Size
This category contains size measures such as lines of code from an evolution perspective: linesAdded, linesModified, or linesDeletedrelative to the total LOC (lines of code) of a file.
47
Implementation (Cont’d)• Team
The number of authors of files influences the way software is developed. It is expected that the more authors are working on the changes the higher the probability of rework and mistakes.
• Complexity of existing solution
According to the laws of software evolution, software continuously becomes more and more complex. Changes are more difficult to add as the software is more difficult to understand and the contracts between existing parts have to retain. As a result they investigate the changeCount in relation to the number of changes during the entire history of each file.
48
Implementation (Cont’d)
• New Requirements
In software development projects usually new classes are added to object-oriented systems when new requirements have to be satisfied. They use the information whether a file was newly introduced during the prediction period
• Relational Aspects
One of the most important features of this category are couplings such as the number of changes/revisions where other files have been committed with.
49
Implementation (Cont’d)
• With the described features, the number of refactorings is predicted
50
Implementation (Cont’d)
• Decision tree and neural network are used as classifiers.
• The F-measure was about 65%. • It is clear that several features such as lines
activity rate and number of lines altered per commit provide much information for the assessment of refactorings.
• But also the structure of the system is crucial for refactorings, as the number of co-changed files and the number of files introduced during the maintenance are relevant features.
51
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
52
Testing• Software quality models help ensure the
reliability of the delivered products.
• Early detection of fault-prone software components enables verification experts to concentrate their time and resources on the problem areas of the software system under development.
• Accurate prediction of fault-prone modules enables the verification and validation activities focused on the critical software components.
53
Testing (Cont’d)
54
Testing (Cont’d)
• Decision trees correctly predicted 79.3% of high development effort fault-prone modules (detection rate), while the trees generated from the best parameter combinations correctly identified 88.4% of those modules on the average.
55
Agenda
• Introduction
• Software Engineering Phases
• Machine Learning Overview
• Applications of ML in SWE with each process:– Project Planning
– Requirements
– Design
– Implementation
– Testing
– Maintenance
• Conclusion
56
Maintenance
• Software maintenance is widely recognized to be the most expensive and time-consuming aspect of the software process.
• A relevance relation maps a tuple of system elements to a value indicating how related they are.
• These software change repositories reflect a history of the system, which includes actions that result in the creation of new relationships and the strengthening of the existing relationships in the software.
57
Maintenance (Cont’d)
58
Maintenance (Cont’d)
• Software entities include documents, source files, routines, modules, variables, and even the entire software system.
• A relevance relation is a predictor that maps tuples of two or more software entities to a value r quantifying how relevant, that is, connected or related, the entities are to each other.
• r shows the strength of relevance among the entities.
59
Maintenance (Cont’d)
60
Maintenance Effort Prediction• If the predictions are based on formal software
development effort prediction models, such as the estimation part of the Function Point Analysis, essential differences in characteristics between software development and software maintenance are neglected
• The focus of software development is the creation of software, but the focus of software maintenance is more the change of software.
• The development of a software application typically is a one-of-a-kind project, but the maintenance activities on an application usually comprise a large number of tasks carried out over a long period of time in a relatively stable environment.
62
Maintenance Effort Prediction
• Some researches collected data on:
– 109 randomly selected maintenance tasks
– 70 applications
– The size of the applications varied from a few thousand lines of code (LOC) to about 500,000 LOC
– the age of the applications varied from less than a year to more than 20 years
– The functions of the applications included payroll, order entry, billing and invoicing, inventory control, service management, and personnel administration.
63
Maintenance Effort Prediction
The following data was collected for each maintenance task:
• Type of maintenance task, i.e., corrective or perfective.
• Priority of task, i.e., high, medium or low priority.
• Maintainer’s knowledge and confidence about how to solve the task immediately after having read or heard the task specification.
• Years of experience as maintainer, and on the maintained application.
• Education level of the maintainer.
• Work-hours (effort) spent on the task.
• Task size and the programming language
• Age and size of the changed application. 64
Maintenance Effort Prediction
Most Important features:
• Cause: Corrective maintenance = 0, otherwise = 1 • Change: More than 50% of the effort is believed to be
spent on updating of code compared to inserting and deleting the code = 0, otherwise = l
• Mode: More than 50% of the effort is believed to be spent on development of new modules (New module mode) = 0, otherwise (Embedded mode) = 1
• Confidence: The maintainer believes he knows how to solve the task when the task specification is read/heard the first time = 0 (High confidence), otherwise = 1 (Medium or low confidence).
65
Maintenance Effort Prediction
Less effect features:
• Type of language
• Maintainer experience
• Task priority
• Application age
• Application size
66
Maintenance Effort Prediction
• Neural network and regression were used as approaches for effort prediction.
• The prediction accuracy was acceptable (error of 60%).
• A recommended use of an effort prediction model is, therefore, to support the expert predictions.
• Another important use of a formal prediction model may be to support the collection and analysis of maintenance data in order to enable improvement of the maintenance process and product.
67
Open Problems
• Most of presented work are immature and a lot of related issues are still open.
• Machine learning can help in the requirements engineering phase in developing knowledge based systems and ontologies to manage the requirements and model problem domains
68
Open Problems (Cont’d)
• One of the most difficult problems is the problem of transforming requirements into architectures. Much research is needed in this area to address the ever increasing complexity of functional and non-functional requirements.
69
Open Problems (Cont’d)
• One area that has received some attention is the use of automated algorithms with machine learning to make repair assignments.
• In any case, more studies with respect to the appropriate criteria for selecting assignment policy, reward mechanisms and management goals need to be undertaken.
70
Conclusion
• The existing work certainly proves that the field of software engineering is a fertile ground for the application of machine learning methods.
• It is clear that there is an increased interest in the niche area of machine learning and software engineering.
71
Conclusion (cont’d)
• The strength of machine learning methods lies in the fact that they have sound mathematical and logical justifications
• The power of machine learning methods does not come from a particular induction method, but instead from proper formulation of the problems and from crafting the representation to make learning tractable.
72
Conclusion (cont’d)
• Machine learning can play a good role in the different phases of software engineering; project planning, requirements analysis, design, implementation, testing, and even in maintenance
• It is expected that this interest in applying machine learning in software engineering tasks will increase significantly especially with the increase interest in the empirical software engineering.
73
Thank you very much
74