12/07/2015Dr Andy Brooks1 Fyrirlestrar 17 & 18 Does Code Decay? “As part of our experience with...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 12/07/2015Dr Andy Brooks1 Fyrirlestrar 17 & 18 Does Code Decay? “As part of our experience with...
19/04/23 Dr Andy Brooks 1
Fyrirlestrar 17 & 18
Does Code Decay?
“As part of our experience with the production of software for a large telecommunications system, we have observed a nearly unanimous feeling among developers of the software that the code degrades through time and maintenance becomes increasingly difficult and expensive.”
Eick et al, 1998
MSc Software MaintenanceMS Viðhald hugbúnaðar
19/04/23 Dr Andy Brooks 2
Case StudyDæmisaga
ReferenceDoes Code Decay? Assessing the Evidence from Change Management Data, Stephen G Eick, Todd L Graves, Alan F Karr, J S Marron, and Audris Mockus, NISS-TR-81 (1998), National Institute of Statistical Sciences, 19 T. W. Alexander Drive, PO Box 14006, Research Triangle Park, NC 27709-4006, USAhttp://www.niss.org/technicalreports/tr81.pdf
“Whether this code decay is real, how it can be characterized, and the extent to which it matters are the questions we address in this paper.”
Eick et al, 1998
19/04/23 Dr Andy Brooks 3
Previous Work
“Early investigations of aging in large software systems by Belady and Lehman [2], [3], [4] reported the near impossibility of adding new code to an aged system without introducing faults.”
Eick et al, 1998
19/04/23 Dr Andy Brooks 4
Access To Large Data Set
• Entire change management history of a 15 year old, real-time, software system for telephone switches:– 100,000,000 lines of code
• C, C++, proprietary state description language
– 100,000,000 lines of header and make files– Some 50 major subsystems and 5,000 modules
• Here, a module is a directory containing several files.
– Each release is some 20,000,000 lines of code
• 10,000 developers have been involved.
19/04/23 Dr Andy Brooks 5
Categories Of Change
• Adaptive– new functionality (e.g. caller ID)– adaptions to new hardware or other changes
in environment
• Corrective– fixing faults
• Perfective– improve maintainability of software
• reengineering (refactoring)
19/04/23 Dr Andy Brooks 6
Change Process
• A new feature (e.g. call waiting) involves hundreds of Initial Modification Requests (IMRs).
• Each IMR results in a number of Modification Requests (MRs) .
• Developers open MRs, perform the changes and make limited checks that the changes are satisfactory.– Inspections and integration and system tests follow.
• An editing change to a single file is captured as a delta.– Lines added and deleted are tracked separately.– Line edits involve first deletion, then addition.
19/04/23 Dr Andy Brooks 7
Data Tracked By Version Management System
89 fields including priority, date opened, date closed
problem
solution
(change & reasons)
19/04/23 Dr Andy Brooks 8
Answering Questions About Change Data
D - directly from version management databaseA - by aggregation over constituent partsD* - problematic aspects
What files were changed, How many modules, files, and lines were affected?...
19/04/23 Dr Andy Brooks 9
What Is Code Decay?
• “Code is decayed if it is more difficult to change than it used to be.”– But increases in difficulty of making changes
may be as a result of an increase in the inherent difficulty of requested changes.
• Decayed code does not mean that the software fails to meet current requirements.– Decayed code means it is difficult to add new
functionality or make other changes.
19/04/23 Dr Andy Brooks 10
What Is Code Decay?• Decayed code may have increased value.
– The changes that have caused the decay mean more functionality for the customer.
• A code unit can decay as a result of changes elsewhere in the software.
• A code unit can be inherently complex and to attribute the difficulty of making a change to decay can be misleading.
19/04/23 Dr Andy Brooks 11
Individual Ability
• Making changes is less difficult for a more more able software maintainer.
• Making changes is more difficult for a junior software maintainer.
• “A definitive adjustment for developer ability has not been devised and usually we must relegate developer variability to ‘noise’ terms in our models.”
19/04/23 Dr Andy Brooks 12
Causes Of Decay
1. Inappropriate architecture• changes have wide scope
2. Violation of original design principles• fixed phone -> mobile/fixed phone
3. Imprecise requirements• ‘crisp code’ not produced
4. Time Pressure• short-cuts, sloppy code, kludges• limited code understanding
19/04/23 Dr Andy Brooks 13
Causes Of Decay5. Inadequate programming tools6. Organizational Environment
• excessive staff turnover• developers fail to communicate properly
7. Programmer variability• weak programmers may not understand
complex code written by more able colleagues
8. Inadequate change process• missing version control• handling changes in parallel
19/04/23 Dr Andy Brooks 14
Medical Metaphor• The software is a patient with a disease
called code decay.• What are the causes of the disease?
– changes made to the code
• What are the disease symptoms?• What are the prognoses if you have the
disease?• What are the relevant risk factors for the
disease?
Sjúkdómseinkenn batahorfur
19/04/23 Dr Andy Brooks 15
Symptoms Of Code Decay
1. Excessively complex code• useful metrics:
• standard software complexity metrics?• # loops & conditionals enclosing a line?
2. A history of frequent changes• also known as ‘code churn’
3. A history of faults• fault fixes themselves may not be examples
of good programming
19/04/23 Dr Andy Brooks 16
Symptoms Of Code Decay
4. Widely dispersed changes• Changes to well-engineered code tend to be
local (within a class).
5. Kludges• Changes made knowing it could have been
done more elegantly or more efficiently.
6. Numerous Interfaces (entry points)• Possible side-effects of changes elsewhere.
19/04/23 Dr Andy Brooks 17
Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect.
1. Size of module m• NCSL(m), number of noncommentary source lines
2. Age of Code• but very stable code might never be changed• variability of age within a code unit may be the key
characteristic3. Inherent Complexity
• real-time software is more likely to decay4. Organizational Churn
• company knowledge base degraded• inexperienced developers make changes
19/04/23 Dr Andy Brooks 18
Risk Factors For Code Decay- Risk factors increase chance of decay or worsen its effect.
5. Ported or Reused Code• Ariane 5 crash was caused by reused code from Ariane 4
• http://edition.cnn.com/WORLD/9606/04/rocket.explode/
6. Requirements Load• very many requirements are difficult to understand and
implement
7. Inexperienced Developers• lack of knowledge• lack of understanding of system architecture
3-tier?
19/04/23 Dr Andy Brooks 19
Code Decay Indices (CDIs) notation
• c for changes (MRs)• l for lines of code• f for files• m for modules• c->m means ‘c touches m’
– Part of m is changed by c.
• 1{A}– equals 1 if event A occurs– equals 0 otherwise
19/04/23 Dr Andy Brooks 20
Code Decay Indices (CDIs) notation
• DELTAS(c)– number of deltas associated with c
• ADD(c)– number of lines added by c
• DEL(c)– number of lines deleted by c
• DATE(c)– date on which c is completed
• INT(c)– the calendar time required to implement c
• DEV(c)– number of developers implementing c
19/04/23 Dr Andy Brooks 21
Historical Count Of Changes
• The number of changes to a module m in the time interval I:
mc
IcDATEImCHNG })({1),(
• With |I| indicating length of time interval I, the frequency of changes is:
),(1
),( ImCHNGI
ImFREQ
19/04/23 Dr Andy Brooks 22
Span Of Changes Scope of Changes
• The span is the number of files touched by a change:
• Changes touching more files are more difficult because:i. The maintainer might have to spend time understanding unfamiliar files.ii. Code interfaces might have to be modified.
}{1)( fccFILESf
19/04/23 Dr Andy Brooks 23
Size
• The size of a module m is NCSL(m) summing over all files f in m.
• “most standard software complexity metrics are almost perfectly correlated with NCSL in our data sets”
19/04/23 Dr Andy Brooks 24
Age
• AGE(m)– the average age of constituent lines
• Variability in line ages is also of interest• The tool SeeSoft produces a visualization of the
variability in line ages:– files represented by boxes– lengths of lines in the boxes proportional to the
number of characters– files that change little have mostly a single colour– files that have been changed a lot are multi-colored
19/04/23 Dr Andy Brooks 25
SeeSoft View Of One Module
19/04/23 Dr Andy Brooks 26
SubSystem Under Analysis
• 100 modules
• 2,500 files
• 6,000 IMRs
• 27,000 MRs
• 130,000 deltas
• 500 different login names made code changes to the subsystem
X 100
19/04/23 Dr Andy Brooks 27
Temporal Behavior Of The Span Of Changes
• Probabilities that a change will touch more than one file doubles from less than 2% in 1989 to more than 5% in 1996.
• Ripples in the high resolution smooth are not statistically significant.
initial development
(different window widths)
Date
89 96
19/04/23 Dr Andy Brooks 28
Breakdown In Modularity?
• Alone, the increase in span of changes does not imply a breakdown in the modularity of the subsystem.
• The increase could simply reflect the growth of the subsystem and changes with a wide span need not cross module boundaries.
cc
cc
19/04/23 Dr Andy Brooks 29
Network Visualization Tool NicheWorks
• Each tadpole shape corresponds to a module.– The tadpole tail indicates the picture at the
end of the previous year.
• Pairs of modules are placed nearby if they have been changed together as part of the same MRs a large number of times.
19/04/23 Dr Andy Brooks 30
NicheWorks View Of The SubSystem Modules1988 1989
1996The architecture that separated the functionally of two clusters of modules is breaking down.
19/04/23 Dr Andy Brooks 31
Alternative Interpretation
The inherent difficulty of the desired changes could have been increasing.
The modification request data are not examined independently from this perspective.
provide an extra area-code digit implement caller-ID
19/04/23 Dr Andy Brooks 32
Prediction Of Faults Quality Prognosis• The best model derived from the data predicts numbers of
faults using numbers of changes to the module in the past.• Large recent changes add most to the fault potential.
• Parameter 0.75 was determined by statistical analysis.• The number of times a module has been changed is a
better predictor than size.• The number of developers working on a module had no
effect on fault potential.
),(),(log)( )(75.0 mcDELmcADDemFPmc
CxDATEWTD
19/04/23 Dr Andy Brooks 33
Prediction Of Effort Effort Prognosis
• “Can the effort required to implement changes be predicted from symptoms and risk factors for decay?”
• Effort data, available only at the feature level, displayed extreme variability, so suggestive results only:– A dependency on FILES(c) was discovered
supporting the idea that the span of changes is a symptom of decay.
• Some changes involved a small number of deltas but required close to maximum effort.
19/04/23 Dr Andy Brooks 34
Summary Eick at al
1. “The increase over time in the number of files touched per change to the code.
2. The decline in modularity of a subsystem of the code, as measured by changes touching multiple modules.
3. Contributions of several factors (notably, frequency and recency of change) to fault rates in modules of code, and
4. That span and size of changes are important predictors (at the feature level) of the effort to implement a change.”
Four analyses demonstrate:
19/04/23 Dr Andy Brooks 35
Summary Eick at al
• The system studied showed no evidence of dramatic, widespread decay:– In seven years, the probability of a change touching
more than 1 file increased only from 2% to 5%.– The architecture that separated the functionally of two
clusters of modules is breaking down.
• Can code decay prove fatal?– “there are anecdotal reports of systems that have
reached a state from which further change is not possible”
19/04/23 Dr Andy Brooks 36
Modification Request Difficulty
• Analysing the nature of the modification requests over time was not done and alternative interpretations of the data set cannot be rule out.
• How can you measure the inherent difficulty of a modification request?– By the span of changes?– By the complexity of the textual description & justification?
• The temporal behaviour of the span of changes could be due to the inherent difficulty of modification requests increasing with time.