Lectur e 6 & 7users.ece.utexas.edu/~miryung/teaching/EE382V-Fall2009/PDF-Lectu… · EE382V Softwar...

56
EE382V Software Evolution: Spring 2009, Instructor Miryung Kim Lecture 6 & 7 Empirical Studies of Software Evolution: Code Decay

Transcript of Lectur e 6 & 7users.ece.utexas.edu/~miryung/teaching/EE382V-Fall2009/PDF-Lectu… · EE382V Softwar...

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Lecture 6 & 7Empirical Studies of Software Evolution: Code Decay

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Announcement

    • Your proposals have been graded.

    • Literature Survey & Tool Evaluation: Each in the range of 1-4

    • Project Proposal: Each in the range of 1-8

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Today’s Agenda

    • Divya’s presentation on the code decay paper

    • Discuss Belady & Lehman’s paper

    • Discuss Code Decay paper

    • Q and A session on my feedback to your proposals

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Today’s Presenter

    • Divya Gopinath (Skeptic)

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    A Model of Large Software Development

    • L.A Belady and M.M Lehman

    • 1976

    • IBM OS/360

    • Seminal empirical study paper in software evolution

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    What was it like in 1976?

    • IBM PC came out around 1984

    • Apple introduced PC in 1970s

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    What was it like in 1976?

    • E.W. Dijkstra: a program must as a mathematical theorem should and can be provable

    • Increasing cost of building and maintaining software was alarming

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Subject Program & Data

    • OS/360

    • 20 years old

    • 20 user-oriented releases

    • Starting with the available data, they attempted to deduce the nature of consecutive releases of OS/360

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    When observing software evolution, what can you measure?

    • # of bugs (reported problems)

    • # of modules => # of directories, # of files

    • performance metrics => times, memory usage, CPU usage, IPC

    • change types: corrective, adaptive, perfective

    • # of developers working in the organization, # of deveopers per module

    • size of code changes => # of lines of changed code

    • how wide spread the changes are => # of files or modules touched by the same change

    • age in calendar year, days, months / age in logical unit such as release, check-in, version

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Variables observed in Belady and Lehman 1976

    • The release number

    • Days between releases

    • The size of the system

    • The number of modules added, deleted, and changed

    • Complexity: the fraction of the released system modules that were handled during the course of release MH_r / M_r

    • Manpower, machine time, and costs involved in each release

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Growth Trends of System Attribute Counts With Time

    Figure 1 Growth trends of system attribute counts with time

    Figure 2 Average growth trends

    of system attributes

    I AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 SIZE IN MODULES

    +RELEASE INTERVAL

    X MODULES HANDLED

    MODULES HANDLED

    COMPONENTS

    . TIME

    The scientific method has made progress in revealing the nature

    of the physical world by pursuing courses other than studying

    individual phenomena in exquisite detail. Similarly, a system, a process, or a phenomenon may be viewed from the outside, by acts of observing; clarifying; and by measuring and modeling

    identifiable attributes, patterns, and trends. From such activities one obtains increasing knowledge and understanding, based on the behavior of both the system and its subsystems, the process and its subprocesses.

    Starting with the initial release of os/360 as a base, we have studied the interaction between management and the evolution

    of os/360 by using certain independent variables of the improve- ment and enhancement (i. e., maintenance) process. We cannot say at this time that we have used all the key independent vari-

    ables. There is undoubtedly much more to be learned about the

    variables and the data that characterize the programming pro-

    cess. Our method of study has been that of regression-outside in-which we have termed “structured analysis.” Starting with

    the available data, we have attempted to deduce the nature of consecutive releases of os/360. We give examples of the data

    226 BELADY AND LEHMAN IBM SYST J

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Growth Trends

    Figure 1 Growth trends of system attribute counts with time

    Figure 2 Average growth trends

    of system attributes

    I AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 SIZE IN MODULES

    +RELEASE INTERVAL

    X MODULES HANDLED

    MODULES HANDLED

    COMPONENTS

    . TIME

    The scientific method has made progress in revealing the nature

    of the physical world by pursuing courses other than studying

    individual phenomena in exquisite detail. Similarly, a system, a process, or a phenomenon may be viewed from the outside, by acts of observing; clarifying; and by measuring and modeling

    identifiable attributes, patterns, and trends. From such activities one obtains increasing knowledge and understanding, based on the behavior of both the system and its subsystems, the process and its subprocesses.

    Starting with the initial release of os/360 as a base, we have studied the interaction between management and the evolution

    of os/360 by using certain independent variables of the improve- ment and enhancement (i. e., maintenance) process. We cannot say at this time that we have used all the key independent vari-

    ables. There is undoubtedly much more to be learned about the

    variables and the data that characterize the programming pro-

    cess. Our method of study has been that of regression-outside in-which we have termed “structured analysis.” Starting with

    the available data, we have attempted to deduce the nature of consecutive releases of os/360. We give examples of the data

    226 BELADY AND LEHMAN IBM SYST J

    that support this systematic study of the programming process.

    Again, however, we wish to emphasize that this study is but the

    beginning of a new approach to analyzing man-made systems.

    The authors have studied the programming process:’ as it per-

    tains to the development of os/360, and now give a preliminary analysis of some project statistics of this programming system, which had already survived a number of versions or releases

    when the study began. The data for each release included mea-

    sures of the size of the system, the number of modules added,

    deleted or changed, the release date, information on manpower

    and machine time used and costs involved in each release. In general there were large, apparently stochastic, variations in the individual data items from release to release.

    All in all, the data indicated a general upward trend in the size,

    complexity, and cost of the system and the maintenance pro-

    cess, as indicated by components, modules, statements, instruc-

    tions, and modules handled in Figure 1. The various parameters

    were averaged to expose trends. When the averaged data were

    plotted as shown in Figure 2, the previously erratic data had become strikingly smooth.

    Some time later, additional data were plotted as shown in Figure 3 and confirmed suspicions of nonlinear-possibly exponential - growth and complexity. Extrapolation suggested further growth

    trends that were significantly at odds with the then current pro- ject plans. The data were also highly erratic with major, but

    apparently serially correlated, fluctuations shown in Figure 4 by

    the broken lines from release to release. Nevertheless, almost

    any form of averaging led to the display of very clear trends as shown by the dashed line in Figure 4. Thus it was natural to

    apply uni- and multivariate regression and autocorrelation tech-

    niques to fit appropriate regression and time-series models to

    represent the process for purposes of planning, forecasting, and improving it in part or as a whole. As the study progressed, evi-

    dence accumulated that one might consider a software mainte- nance and enhancement project as a self-regulating organism, subject to apparently random shocks, but-overall -obeying its own specific conservation laws and internal dynamics.

    Thus these first observations encouraged the search for models

    that represented laws that governed the dynamic behavior of the metasystem of organization, people, and program material involved in the creation and maintenance process, in the evolu-

    tion of programming systems.

    It is perhaps necessary to explain here why we allege continu-

    ous creation, maintenance, and enhancement of programming systems. It is the actual experience of all who have been in-

    NO. 3 . 1976 LARGE-PROGRAM DEVELOPMENT

    the

    programming

    process

    Figure 3 Average growth trends

    of system attributes

    compared with planned

    growth

    I I

    AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 Sl2E IN MODULES

    + RELEASE INTERVAL X MODULES HANDLED

    Figure 4 Serial and average

    growth trends of a

    particular attribute

    REND VERAGE

    RELEASE SEQUENCE NUMBE

    laws of

    program

    evolution

    227

    that support this systematic study of the programming process.

    Again, however, we wish to emphasize that this study is but the

    beginning of a new approach to analyzing man-made systems.

    The authors have studied the programming process:’ as it per-

    tains to the development of os/360, and now give a preliminary analysis of some project statistics of this programming system, which had already survived a number of versions or releases

    when the study began. The data for each release included mea-

    sures of the size of the system, the number of modules added,

    deleted or changed, the release date, information on manpower

    and machine time used and costs involved in each release. In general there were large, apparently stochastic, variations in the individual data items from release to release.

    All in all, the data indicated a general upward trend in the size,

    complexity, and cost of the system and the maintenance pro-

    cess, as indicated by components, modules, statements, instruc-

    tions, and modules handled in Figure 1. The various parameters

    were averaged to expose trends. When the averaged data were

    plotted as shown in Figure 2, the previously erratic data had become strikingly smooth.

    Some time later, additional data were plotted as shown in Figure 3 and confirmed suspicions of nonlinear-possibly exponential - growth and complexity. Extrapolation suggested further growth

    trends that were significantly at odds with the then current pro- ject plans. The data were also highly erratic with major, but

    apparently serially correlated, fluctuations shown in Figure 4 by

    the broken lines from release to release. Nevertheless, almost

    any form of averaging led to the display of very clear trends as shown by the dashed line in Figure 4. Thus it was natural to

    apply uni- and multivariate regression and autocorrelation tech-

    niques to fit appropriate regression and time-series models to

    represent the process for purposes of planning, forecasting, and improving it in part or as a whole. As the study progressed, evi-

    dence accumulated that one might consider a software mainte- nance and enhancement project as a self-regulating organism, subject to apparently random shocks, but-overall -obeying its own specific conservation laws and internal dynamics.

    Thus these first observations encouraged the search for models

    that represented laws that governed the dynamic behavior of the metasystem of organization, people, and program material involved in the creation and maintenance process, in the evolu-

    tion of programming systems.

    It is perhaps necessary to explain here why we allege continu-

    ous creation, maintenance, and enhancement of programming systems. It is the actual experience of all who have been in-

    NO. 3 . 1976 LARGE-PROGRAM DEVELOPMENT

    the

    programming

    process

    Figure 3 Average growth trends

    of system attributes

    compared with planned

    growth

    I I

    AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 Sl2E IN MODULES

    + RELEASE INTERVAL X MODULES HANDLED

    Figure 4 Serial and average

    growth trends of a

    particular attribute

    REND VERAGE

    RELEASE SEQUENCE NUMBE

    laws of

    program

    evolution

    227

  • What can you deduce from these graphs?

    • It takes longer and longer to release the next release

    • The size of increases over time

    • It requires modifying more and more modules per each release

    that support this systematic study of the programming process.

    Again, however, we wish to emphasize that this study is but the

    beginning of a new approach to analyzing man-made systems.

    The authors have studied the programming process:’ as it per-

    tains to the development of os/360, and now give a preliminary analysis of some project statistics of this programming system, which had already survived a number of versions or releases

    when the study began. The data for each release included mea-

    sures of the size of the system, the number of modules added,

    deleted or changed, the release date, information on manpower

    and machine time used and costs involved in each release. In general there were large, apparently stochastic, variations in the individual data items from release to release.

    All in all, the data indicated a general upward trend in the size,

    complexity, and cost of the system and the maintenance pro-

    cess, as indicated by components, modules, statements, instruc-

    tions, and modules handled in Figure 1. The various parameters

    were averaged to expose trends. When the averaged data were

    plotted as shown in Figure 2, the previously erratic data had become strikingly smooth.

    Some time later, additional data were plotted as shown in Figure 3 and confirmed suspicions of nonlinear-possibly exponential - growth and complexity. Extrapolation suggested further growth

    trends that were significantly at odds with the then current pro- ject plans. The data were also highly erratic with major, but

    apparently serially correlated, fluctuations shown in Figure 4 by

    the broken lines from release to release. Nevertheless, almost

    any form of averaging led to the display of very clear trends as shown by the dashed line in Figure 4. Thus it was natural to

    apply uni- and multivariate regression and autocorrelation tech-

    niques to fit appropriate regression and time-series models to

    represent the process for purposes of planning, forecasting, and improving it in part or as a whole. As the study progressed, evi-

    dence accumulated that one might consider a software mainte- nance and enhancement project as a self-regulating organism, subject to apparently random shocks, but-overall -obeying its own specific conservation laws and internal dynamics.

    Thus these first observations encouraged the search for models

    that represented laws that governed the dynamic behavior of the metasystem of organization, people, and program material involved in the creation and maintenance process, in the evolu-

    tion of programming systems.

    It is perhaps necessary to explain here why we allege continu-

    ous creation, maintenance, and enhancement of programming systems. It is the actual experience of all who have been in-

    NO. 3 . 1976 LARGE-PROGRAM DEVELOPMENT

    the

    programming

    process

    Figure 3 Average growth trends

    of system attributes

    compared with planned

    growth

    I I

    AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 Sl2E IN MODULES

    + RELEASE INTERVAL X MODULES HANDLED

    Figure 4 Serial and average

    growth trends of a

    particular attribute

    REND VERAGE

    RELEASE SEQUENCE NUMBE

    laws of

    program

    evolution

    227

  • What can you deduce from these graphs?

    • Handling fewer number of modules as the software evolves

    that support this systematic study of the programming process.

    Again, however, we wish to emphasize that this study is but the

    beginning of a new approach to analyzing man-made systems.

    The authors have studied the programming process:’ as it per-

    tains to the development of os/360, and now give a preliminary analysis of some project statistics of this programming system, which had already survived a number of versions or releases

    when the study began. The data for each release included mea-

    sures of the size of the system, the number of modules added,

    deleted or changed, the release date, information on manpower

    and machine time used and costs involved in each release. In general there were large, apparently stochastic, variations in the individual data items from release to release.

    All in all, the data indicated a general upward trend in the size,

    complexity, and cost of the system and the maintenance pro-

    cess, as indicated by components, modules, statements, instruc-

    tions, and modules handled in Figure 1. The various parameters

    were averaged to expose trends. When the averaged data were

    plotted as shown in Figure 2, the previously erratic data had become strikingly smooth.

    Some time later, additional data were plotted as shown in Figure 3 and confirmed suspicions of nonlinear-possibly exponential - growth and complexity. Extrapolation suggested further growth

    trends that were significantly at odds with the then current pro- ject plans. The data were also highly erratic with major, but

    apparently serially correlated, fluctuations shown in Figure 4 by

    the broken lines from release to release. Nevertheless, almost

    any form of averaging led to the display of very clear trends as shown by the dashed line in Figure 4. Thus it was natural to

    apply uni- and multivariate regression and autocorrelation tech-

    niques to fit appropriate regression and time-series models to

    represent the process for purposes of planning, forecasting, and improving it in part or as a whole. As the study progressed, evi-

    dence accumulated that one might consider a software mainte- nance and enhancement project as a self-regulating organism, subject to apparently random shocks, but-overall -obeying its own specific conservation laws and internal dynamics.

    Thus these first observations encouraged the search for models

    that represented laws that governed the dynamic behavior of the metasystem of organization, people, and program material involved in the creation and maintenance process, in the evolu-

    tion of programming systems.

    It is perhaps necessary to explain here why we allege continu-

    ous creation, maintenance, and enhancement of programming systems. It is the actual experience of all who have been in-

    NO. 3 . 1976 LARGE-PROGRAM DEVELOPMENT

    the

    programming

    process

    Figure 3 Average growth trends

    of system attributes

    compared with planned

    growth

    I I

    AVERAGE RELEASE

    SEQUENCE NUMBER LEGEND:

    0 Sl2E IN MODULES

    + RELEASE INTERVAL X MODULES HANDLED

    Figure 4 Serial and average

    growth trends of a

    particular attribute

    REND VERAGE

    RELEASE SEQUENCE NUMBE

    laws of

    program

    evolution

    227

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Belady & Lehman: the Law of Program Evolution Dynamics

    1. Law of continuing change: a system that is used undergoes continuing change until it is judged more cost effective to freeze and recreate it

    2. Law of increasing entropy: the entropy of a system (its unstructuredness) increases with time, unless specific work is executed to maintain or reduce it.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Belady & Lehman: the Law of Program Evolution Dynamics

    3. Law of statistically smooth growth: Growth trend measures of global system attributes may appear to be stochastic locally in time and space, but statistically, they are cyclically self-regulating, with well-defined long-range trends

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Belady & Lehman: the Law of Program Evolution Dynamics

    • Law of continuing change: a system that is used undergoes continuing change until it is judged more cost effective to freeze and recreate it

    • !M_r = 200 + S1 + Z1

    • Law of increasing entropy: the entropy of a system (its unstructuredness) increases with time, unless specific work is executed to maintain or reduce it.

    • C_r = 0.14 + 0.0012R^2 + S2 + Z2

    • Law of statistically smooth growth: Growth trend measures of global system attributes may appear to be stochastic locally in time and space, but statistically, they are cyclically self-regulating, with well-defined long-range trends

    • M_r = 760 + 200 R + S + Z (where S and Z represents cyclic and stochastic components)

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    As a skeptic:

    • Laws are presumptive

    • How do you use for real daily software development?

    • External validity: does the law hold for other projects?

    • Factors:

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    As a skeptic:

    • What is the unit of a module and a component?

    • What is the granularity of a release? Do they have the same amount of functionality addition per each release?

    • What types of changes does each release include?

    • Any changes in the organization structures & developers?

    • Are they laws or just hypotheses?

    • What are potential contributions / benefits of understanding software evolution?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    My general thoughts on Belady & Lehman

    • Very insightful paper at the time of 1976

    • The first use of statistical regression for characterizing software evolution

    • Discussed the nature of software evolution, characterized it using their empirical data

    • Deduction of laws from one system’s evolution --- very weak external validity, perhaps hasty conclusions

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Does Code Decay?

    • Eick et al.

    • TSE 2001 (almost 25 years after Belady & Lehman’s Study)

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Problem Definition

    • What do the authors mean by “code decay?”

    • it is harder to change than it should be

    • related to Belady & Lehman’s second law: the entropy of a system increases with time, unless specific work is executed to maintain or reduce it.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Discussed Problem

    • Check whether code decay is real: “Does Code Really Decay?”

    • how code decay can be characterized

    • the extent to which each risk factor matters

    • *Empirical Study* Paper

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Hypotheses

    • What the authors are trying or expecting to find?

    • The span of files increases over time (age)

    • Effort has some relations to many measurable variables.

    • Modularity breaks over time

    • Fault potential has some relation to many measurable variables.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Hypotheses

    • What the authors are trying or expecting to find? 1. The span of changes increases over time

    2. Breakdown of modularity increases over time

    3. Fault potential, the likelihood of changes to induce faults has some relations to ...

    4. Efforts has some relationship too

    • Usually *good* empirical study paper either finds surprising empirical evidence that contradicts conventional wisdom or provides thorough empirical evidence that validates well known hypotheses.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Study Approach

    • Data selection

    • Selection of measurement variables (so called independent variables)

    • Study method that finds *relationships* among the measurement variables

  • Study Approach: (1) Data Selection

    • Rich data set

    • Telephone switching system

    • 100 million LOC

    • 5000 modules

    • 50 major subsystems

    • in C and C+ !"#$%&' ()*+ ," -.&, $./$ $." 0%-$1%23$%&' &4 4/35$- %-"675/%'"0 28 $." 0%-$1%23$%&' &4 5/19"+ 1"#"'$ #./'9"- $&$." -&4$,/1":+ /'0 ;: !"#$%&'%() (* #**("' 1"5/$"0 $& &31 4/35$ IVJ+S.5--&' /'0 W52"19 DLXF %0"'$%48 4/35$G71&'" =&035"- %'-,%$#.%'9 -8-$"= -&4$,/1")

    Y,& "/158 43'0/="'$/5 7/7"1- 1"5/$%'9 -&4$,/1" 0/$/#&55"#$%&' /'0 %$- /'/58-%- /1" DLLF+ DLEF)

    - ./01234 56 4675)0#3

    S31 0"4%'%$%&' &4 / #./'9" $& -&4$,/1" %- 01%A"' 28 $." 0/$/$./$ /1" /A/%5/25"Z W #./'9" %- +), +-'#"+'%() '( '.# /(*'0+"#"#&("$#$ %) '.# &.+)1# .%/'(", $+'+ 2+/#) Y." -7"#%4%# 0/$/ ,%$.,.%#. ," 0"/5 /1" 0"-#1%2"0 %' !"#$%&' E)E /'0 !"#$%&' ()

    Y." #./'9"- ," -$308 4/55 '/$31/558 %'$& $.1"" =/%'#5/--"- ?-"" DL*F /'0 DL;F: $./$ 0"4%'" $." "A&53$%&' &4 /-&4$,/1" 71&03#$) 3$+!'%4# &.+)1#/ /00 '", 43'#$%&'/5%$8 $&/ -8-$"= ?4&1 "6/=75"+ #/55"1 JV %' / $"5"7.&'" -,%$#.:+ &1/0/7$ $." -&4$,/1" $& '", ./10,/1" &1 &$."1 /5$"1/$%&'- %'%$- "'A%1&'="'$) 5(""#&'%4# &.+)1#/ 4%6 4/35$- %' $." -&4$,/1")6#"*#&'%4# &.+)1#/ /1" %'$"'0"0 $& %=71&A" $." 0"A"5&7"1-[

    /2%5%$8 $& =/%'$/%' $." -&4$,/1" ,%$.&3$ /5$"1%'9 43'#$%&'G/5%$8 &1 4%6%'9 4/35$-) N"14"#$%A" =/%'$"'/'#" ./- /5-& 2""'#/55"0 \=/%'$"'/'#" 4&1 $." -/>" &4 =/%'$"'/'#"] &1\1""'9%'""1%'9)]

    -"! 58$ .8&9:$ ;+*"0 &3$] &4 $." A"1-%&' =/'/9"="'$-8-$"=+ "0%$"0+ /'0 $."' \#."#>"0 %')] C%'"- /00"0 /'05%'"- 0"5"$"0 28 / 0"5$/ /1" $1/#>"0 -"7/1/$"58) ?Y& #./'9" /5%'"+ / 0"A"5&7"1 4%1-$ 0"5"$"- %$+ $."' /00- $." '", A"1-%&'&4 $." 5%'"):E

    W =/`&1 &19/'%T/$%&'/5 7/1/0%9= -.%4$ ?-"" DLMF: 4&1 &'"&4 $." &19/'%T/$%&'- ,&1>%'9 &' $." -8-$"= 031%'9 %$-5%4"$%=" ./- 2""' / $1/'-%$%&' 41&= 0"A"5&7"1 &,'"1-.%7 &4=&035"- ?,%$. / 4"/$31" %=75"="'$"0 28 /55 0"A"5&7"1-,.& &,' =&035"- $./$ /1" $&3#."0: $& 0"A"5&7"1 &,'"1G-.%7 &4 4"/$31"-+ ,%$. $." 4"/$31" &,'"1?-: =/>%'9 #./'9"-,."1"A"1 '"#"--/18) J=75%#/$%&'- &4 $.%- /1" 0%-#3--"0 %'!"#$%&'- ()L /'0 ();)

    -"- .8&9:$ >&9&:$?$9' @&'&

    V/$/ 7"1$/%'%'9 $& $." #./'9" .%-$&18 &4 $." #&0" %$-"541"-%0" %' / A"1-%&' =/'/9"="'$ -8-$"=+ ,.%#. $1/#>-#./'9"- /$ $." 4"/$31"+ JKQ+ KQ+ /'0 0"5$/ 5"A"5-) H%$.%'$." A"1-%&' =/'/9"="'$ -8-$"=+ $." -$13#$31" &4 $."#./'9"- %- /- 4&55&,- ?-"" ^%9) L:)

    @/#. JKQ ./- /' "6$"'-%A" 1"#&10 #&'$/%'%'9 71%&1%$8+0/$" &7"'"0 /'0 #5&-"0+ 7&%'$ %' $." 0"A"5&7="'$ 71"--,."' %$ ,/- %'%$%/$"0 ?1"

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Study Approach: (2) Measure Independent Variables

    • c denotes changes (mostly a MR)

    • Variables• DELTAS(c) = # of deltas associated with c

    • ADD(c) = # of lines added by c

    • DEL(c) = # of lines deleted by c

    • DATE(c) = the date on which c is completed

    • INT(c) = the interval of c (calendar time required to implement c)

    • DEV(c) = number of developers implementing c

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Study Approach: (2) Measure Independent Variables

    • Derived variables• FREQ(m, I) = / I

    • FILES(c) = # of files touched for change c

    • NCSL (m) = # of non-commentary source lines per module

    • AGE(m) = average age of its consequent lines

    !" #$%&'$'(!) *'+,&!#(- #( . /#-/'* *#+0 "1 2'3.45 6(.22#!#"() . /'.74 *'8,#*'$'(! &".2 #+ '&4 !" /.7'.33*'!'2 "7'* !#$') +" !/.! !/' 3"2' #+ 2"#(- !/#(-+ #!9.+ ("! 2'+#-('2 !" 2"5

    !" !"#$%#&'#"(#) )#*#+,%#&- 3.( :' '#!/'* ('9 !" %*";-*.$$#(- "* ('9 !" %.*!#3,&.* 3"2'5

    !! 2'("!'+ . +,$

    "7'* .&& 3/.(-'+5Q"* . 3/.(-' ! .(2 +"1!9.*' ,(#! $) ! "% $ $'.(+ !/.!

    W! !",3/'+ $PX +"$' %.*! "1 $ #+ 3/.(-'2 :4 !5 G&+" !!&")!/' #(2#3.!"* "1 !/' '7'(! &) #+ '8,.& !" "(' #1 & "33,*+ .(2H'*" "!/'*9#+'5

    6( .22#!#"() +'7'*.& "1 !/' EF6+ =.&& 3"$%,!.:&' 2#*'3!&41*"$ !/' 7'*+#"( $.(.-'$'(! 2.!. :.+'> 2'%'(2 "(3/.*.3!'*#+!#3+ "1 3/.(-'+P

    !"#$%!$ % '()*+, -. /+0123 233-4521+/ 6517 !%!!#!$ % '()*+, -. 05'+3 2//+/ *8 4!"##!$ % '()*+, -. 05'+3 /+0+1+/ *8 !

    !%$"#!$ % 17+ /21+ -' 67547 ! 53 4-)90+1+/'67547 6+ 1+,) 17+ /21+ -. !

    :;$#!$ % 17+ !"#$%&'( -. !' 17+ #420+'/2,$ 15)+,++0-9+,3 5)90+)+'15'? !(

    !"#"$ %&'()*+ ), -*./0.1( 23415.'

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Study Approach: (3) Finding Correlation

    • Linking risk factors to symptoms

    • Statistical regression

    • This requires designing some template models

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Study Approach: (3) Finding Correlation

    • Fault Potential 1. The number of faults that will have to be fixed in module m in the future. Change effects are dampened over time

    • Fault Potential II. The number of faults that will have to be fixed in a module in the future. Faults are less likely in older code (when beta is =

    !"#"$ %&'(

    ?$% &0@% 1< 6 *1,';% ! $6& 6;5%6,) (%%" ,%S ': #1 #0*% * 6", )*$ 0&,0&/'&&%, 0" .%/#01" J=K=J 6", +$%5% )4 +4 $04 6", $7 65%%*6#%, '&0"A #0/6; 6"6;)&0&=

    T1#$ 1< #$%&% 0",0/%& 0;;'*0"6#% /$6"A% 6& #$% :50*65)6A%"# /5%6#0"A S ': #1 #0*% * 6", )*$ 0&,0&/'&&%, 0" .%/#01" J=K=J 6", +$%5% )4 +4 $04 6", $7 65%%*6#%, '&0"A #0/6; 6"6;)&0&=

    T1#$ 1< #$%&% 0",0/%& 0;;'*0"6#% /$6"A% 6& #$% :50*65)6A%"# /5%6#0"A "#$"#$3&* &,&*4"$" (5 #+$" $,'!I &66!&/" $, 2!3#$(, 71J1

    ! "#$ $%&'$()$ *+, '$)-.

    K, #+$" "!3#$(,- G! '$"3."" "(%! (5 (./ %&L(/ /!".*#" #( '!1>** (5 #+!"! &,&*4"!" &/! )&"!' (, & "$,=*! ".)"4"#!% (5 #+!3('!- 3(,"$"#$,= (5 &66/(I$%!*4 MNN %('.*!" &,' O-7NN5$*!"1 H+! 3+&,=! '& 3(,"$"# (5 /(.=+*4 F-NNN KPQ"- OR-NNNPQ"- &,' M8N-NNN '!*#&"1 2(%! 7NN '$55!/!,# *(=$, ,&%!"%&'! 3+&,=!" #( #+! 3('! $, #+$" ".)"4"#!%1H+! /!".*#" 4$!*' E!/4 "#/(,= !E$'!,3! #+ 3('! '(!"

    '!3&41 S$/"#- $, 2!3#$(, 71M- "#$"#$3&* "%((#+$,= '!%(,;"#/!" #+ #+! "6&, (5 3+&,=!" :"!! :800 $,3/!&"!" (E!/ #$%!-G+$3+ $" & 3*!&/ "4%6#(% (5 3('! '!3&41 H+$" &,&*4"$" $"!I#!,'!'- $, 2!3#$(, 71O- )4 ,!#G(/T;)&"!' E$".&*$A$(,""+(G$,= #+ #+! $,3/!&"! $, "6&, $" &33(%6&,$!' )4 :&,'%&4 3&."!0 & )/!&T'(G, $, #+! %('.*&/$#4 (5 #+! 3('!19./ (#+!/ /!".*#" "+(G +(G '!3&4 &55!3#" #G( (5 #+!

    #+/!! T!4 /!"6(,"!"- ,&%!*4- 36? 4< )96?@1A

    H+! CUK SKDV2 (5 :80 %!&"./!" #+! '$55$3.*#4 (5 & 3+&,=! )4+(G %&,4 3('! .,$#" :5$*!"0 ,!!' #( )! 3+&,=!' $, (/'!/ #($%6*!%!,# $#1 >, $,3/!&"! $, #+! "6&, (5 3+&,=!"- #+!,- $""4%6#(%$3 (5 '!3&4- &" '$"3.""!' $, 2!3#$(, J1O1O1S$=1 8 "+(G" #+ "6&, $" $,3/!&"$,= 5(/ #+! ".)"4"#!%

    .,'!/ "#.'41 H+!/!- G! '$"6*&4 #+! 3+&,3! #+ &,4 =$E!,#$%! &, PQ #(.3+!" %(/! #+&, (,! 5$*! )4 "%((#+$,= '&$, G+$3+ !&3+ 6($,# 3(//!"6(,'" #( &, PQ1 > 6($,#W" &;3((/'$,! $" #$%! /!6/!"!,#!' )4 #+! (6!,$,= '! &,' $#"';3((/'$,! $" (,! G+!, %(/! #+&, (,! 5$*! $" #(.3+!' &,'A!/( (#+!/G$"!1 H+/!! *(3&* *$,!&/ "%((#+" :"!!- !1=1- XOMY&,' XOOY 5(/ $,#/('.3#$(, &,' '$"3.""$(,10 &/! "+(G, $, #+!#(6 6*(#1 H+!"! "%((#+" &/! !""!,#$&**4 G!$=+#!' *(3&*&E!/&=!"- G+!/! #+! G!$=+#" +&E! & Z&.""$&, "+&6! &,' #+!G$'#+" (5 #+! G$,'(G" :$1!1- "#&,'&/' '!E$$(, (5 #+!G!$=+# 5.,3#$(,0 &/! ( # $%* :6./6*! 3./E!0- ( # %%. :%.*#$;3(*(/!' 3./E!0- &,' ( # 3%. :)*.! 3./E!01H+! 3!,#/&* 3./E!- ( # %%.- "+(G" &, $,$#$&* '(G,G&/'

    #/!,'- G+$3+ $" ,./&* )!3&."! %&,4 5$*!" &/! #(.3+!' )43(%%(, 3+&,=!" $, #+! $,$#$&* '!E!*(6%!,# 6+&"!-5(**(G!' )4 & "#!&'4 .6G&/' #/!,' "#&/#$,= $, M[[N1 H+$"*&"# #/!,' /!5*!3#" )/!&T'(G, $, #+! %('.*&/$#4 (5 #+!3('!- &" G! '$"3."" 5./#+!/ $, 2!3#$(, 71O1 H+ #+$" $" &".)"#&,#$&* $,3/!&"! 3(%!" 5/(% #+! 5&3# #+ E&*.!" (,#+! ';&I$" /!6/!"!,# 6/()&)$*$#$!" :*(3&* $, #$%!0 #+ &3+&,=! G$** #(.3+ %(/! #+&, (,! 5$*!- G+$3+ %(/! #+&,'(.)*!" 5/(% & *(G (5 *!"" #+&, O 6!/3!,# $, M[\[ #( %(/!#+&, 7 6!/3!,# $, M[[F1K, #+! &)"!,3! (5 %(/! '!#&$*!' &,&*4"$"- #+! /!".*#" $,

    #+! #(6 6*(# $, S$=1 8 '!6!,' (, #+! G$,'(G G$'#+ (1 H+!*&/=!/ G$,'(G G$'#+- ( # 3%.- "+(G" (,*4 #+! .6G&/'#/!,'- G+$*! #+! "%&**!/ G$,'(G G$'#+- ( # $%*- "+(G" & *(#(5 &''$#$(,&* "#/.3#./!- G+$3+ %&4 )! ]%$3/(#/!,'"^ (/ %&4$,"#!&' )! "6./$(." "&%6*$,= &/#$5&3#"1 _.# +(G 3&, G! )!"./!` S./#+!/%(/!- +(G '( G! T,(G #+! 5!./!" ()"!/E!'$, #+! ( # %%. "%((#+- G+$3+ 3(,#&$," #+! $%6(/#&,#*!""(,"- &/! /!&*`H+! )(##(% +&*5 (5 S$=1 8 $" & 1-2") ,+#-MM G+$3+

    &''/!""!" #+$" $"".!1 V&3+ *(3$(, 3(//!"6(,'" #( & '!&,' #( & G$,'(G G$'#+ ( &,' $" "+&'!' )*.! :/!'0 G+!,#+! "%((#+ #+ G$,'(G G$'#+ &,' '! $" "#$"#$3&**4"$=,$5$3&,#*4 $,3/!&"$,= :'!3/!&"$,=- /!"6!3#$E!*401 Q!=$(,"G+!/! #+!/! $" ,( "$=,$5$3&,# 3+&,=! &/! "+&'!' $, #+!$,#!/%!'$! 3(*(/ 6./6*!1H+! "%&**!"# G$,'(G G$'#+- ( # $%*- $, #+! #(6 6*(# $"

    /!6/!"!,#!' )4 #+! )(##(% G+$#! *$,! $, #+! *(G!/ 6*(# &,'$" "+&'!' 6./6*! $, #+! #(6 6*(#- "$,3! #+$" G$,'(G G$'#+ $""+&'!' 6./6*! &** '!" $, #+! 2$a!/ %&61 H+$" $"$,#!/6/!#!' &" ]G+!, #+! '& &/! "#.'$!' #+$" *!E!* (5/!"(*.#$(,- #+!/! &/! ,( "$=,$5$3&,# $,3/!&"!" (/ '!3/!&"!"-^$1!1- #+! G$==*!" $, #+! 3./E! &/! ,(# "#$"#$3&**4 "$=,$5$3&,#1H+! $,#!/%!'$! G$,'(G G$'#+- ( # %%.- /.," #+/(.=+

    )(#+ #+! /!' &,' )*.! "+&'!' /!=$(,"1 H+$" "&%! 3(*(/$,= $"."!' $, #+! 3./E! $, #+! #(6 6*(#- G+$3+ "+(G" #+ #+!"#/.3#./! $" "#$"#$3&**4 "$=,$5$3&,#1 K, 6&/#$3.*&/- #+!/! $" &,$%6(/#&,# '(G,G&/' #/!,' #+! )!=$,,$,= &,' .6G&/'#/!,' &5#!/ M[[N1

    !"#$ !% &'() *+!, #+*! *!#&-. &,,!,,"/0 %1! !2"*!/#! 34+5 #1&/0! 5&/&0!5!/% *&%& 6

    MM1 2$a!/- $#" 6/(6!/#$!"- &,' "(%! E&/$$(," &/! '$"3.""!' $, XO8Y1

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Results: (1) The span of changes increases over time?

    !"# $%&'# ()*+,( ()+-". ! ! !"". &/*0 -"&,/'" -"#'),* -"%- )0 0"%+#+ #*-)$1 2$/# )* -"# 2,--,3 4$,- %*+.

    -"/0. )*"#&)-0 -")0 5,$,& )* -"# -,4 4$,-6 !")0 0",(0 -"%-

    ("#* -"# +%-% % 03,,-"#+ *#%&$1 -, -"# 4,)*- ,7 +,)*' %

    0)34$# $)*#%& $#%0- 08/% 7)-. -"# /$-)*' $)*# 0$,4#0

    0)'*)7)5%*-$1 /4(%&+06!"#0# 5,*5$/0),*0 % 5,34$#3#*-%&1 &%-"#& -"%* )*5,*9

    0)0-#*-. 2#5%/0# :);#& 0",(0 ("%- )0 "%44#*)*' !" #!$% &$!'#

    () *#&('+",(-6 $,*#. -"# )*5%0# )* 04%* ,7 5"%*'#0 +#05&)2#+ )*

    :#5-),* E6F +,#0 *,- )34$1 2%?+,(* ,7 -"# 3,+/$%&)-1 ,7

    -"# 0/2010-#36 :,3# )*5%0# )* 04%* 5,/$+ $#5- 0)34$1

    ! "### $%&'(&)$"*'( *' (*+$,&%# #'-"'##%"'-. /*01 23. '*1 4. 5&'6&%7 2884

    +9:1 ;1 (9 ?@AB CD EF?G=>B CD D9H=B ICFJK=L GM JK@E:= IK>CF:K I9?=1 $K= CN=>@HH I>=EL K@B G==E @ B9:E9D9J@EI 9EJ>=@B= 9E IK= L9DD9JFHIM CD JK@E:=B1&I @ D9E=> >=BCHFI9CE. IK=>= O@B @ L=J>=@B9E: I>=EL LF>9E: IK= L=N=HCA?=EI@H AK@B= CD IK= BFGBMBI=? OK=E JK@E:=B H9P=HM 9ENCHN=L ?FHI9AH= D9H=B. GFIIK9B I>=EL >=N=>B=L G=DC>= HCE:1

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Results: (2) Breakdown of modularity increases over time?

    !"# $%&'!" &( !"# )*+),)!#- ./0 #1#/ 2"./$#) &( '30#%)4./ /##0 /&! 2%&)) -&0*5# +&*/0.%3#)6 7"# /#!'&%813)*.539.!3&/ !&&5 :32"#;&%8) ?@ '"32" "#54) 03)45.,)!%*2!*%# A3/25*03/$ 25*)!#%)B 3/ /#!'&%8)@ .55&') *) !&.00%#)) !"# C*#)!3&/ &( '"#!"#% -&0*5.%3!, 3) +%#.83/$0&'/ &1#% !3-#@ ./0 5#.0) !& !"# %#)*5!) 3/ D3$6 >@ '"32")*$$#)! )!%&/$5, !".! 3! 3)6

    7"# "#.0 &( #.2" !.04&5#E538# )".4# 3/ !"# *44#% 5#(!4./#5 &( D3$6 > 2&%%#)4&/0) !& . -&0*5#F !"# 4&)3!3&/) &( !"#-&0*5#) ".1# +##/ 2"&)#/ +, :32"#;&%8) 3/ . -.//#% !".!45.2#) 4.3%) &( -&0*5#) /#.%+, 3( !"#, ".1# +##/ 2"./$#0!&$#!"#% .) 4.%! &( !"# ).-# GH) . 5.%$# /*-+#% &( !3-#)6G&%# 4%#23)#5,@ !"# '#3$"!) .%# 0#(3/#0 3/ !#%-) &( !"#I/*-+#% &( 2"./$#)J KLM &( ANB@ '3!" !".! (&% -&0*5#) !./0 !! +#3/$

    ""!#!!# $ !"#$"%#%!#

    '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$"%# % !"#$"%!#

    " # "(#

    '"#%# $ 3) ./ 3/!#%1.5 &( !3-# A)## +#5&'B ./0 '"#%#

    !"#$"%#%!# $#

    % $& !#% $& !!)&*+,-".# ' &(

    3) !"# /*-+#% &( GH) !&*2"3/$ +&!" ! ./0!!6 M/ !"# *44#%5#(! 4./#5@ !"# "#.0) )"&' !"3) /#!'&%8 13#' *)3/$ .55 !"#2"./$# 0.!. !"%&*$" !"# #/0 &( 2.5#/0.% ,#.% NOPPA2&%%#)4&/03/$ !& &/# 2"&32# &( $B@ '"35# !"# !.35) &( !"#)#$-#/!) 03)45., !"# ).-# 13#' .! !"# #/0 &( NOPQ A./#.%53#% 2"&32# &( $B6

    M/ !"3) '.,@ &/# 2./ )## "&' %#5.!3&/)"34) .-&/$-&0*5#) #1&51#0 !"%&*$" !3-#6 M/ !"# !&4 4./#5 &( D3$6 >@!"#%# .%# !'& -.3/ 25*)!#%) &( %&*$"5, . 0&9#/ -&0*5#)#.2"6 R&'#1#%@ 3/ !"# 2#/!#% 4./#5@ '"32" 03)45.,) !"#2"./$# 0.!. !"%&*$" !"# #/0 &( NOPO 3/ !"# 5&2.!3&/) &( !"#"#.0) A!"# NOPP 0.!. .44#.% "#%# .) !"# 5&2.!3&/) &( !"#)#$-#/!)S #/04&3/!)B@ !"#)# !'& 25*)!#%) ".1# -&)!5,-#%$#06 7"# -#%$3/$ 4%&2#)) 2&/!3/*#0 ./0 .! !"# #/0 &(NOOT !"# 25*)!#%) .%# /& 5&/$#% 13)3+5#6 ;"35# !"# 5&$32 &( !"#2&0# '.) &%3$3/.55, 3/!#/0#0 !& )## !& 3! !".! )&-# -&0*5#)'&*50 +# #))#/!3.55, 3/0#4#/0#/! &( #.2" &!"#%@ /#' ./0*/./!3234.!#0 (*/2!3&/.53!, -., ".1# "#54#0 !& 0#)!%&, !"3)3/0#4#/0#/2#6

    7"# '#3$"!) 3/ AQB 2&/)!3!*!# . 2&-4%&-3)# +#!'##/)3-45# 2&*/!)

    ""!#!!# $ !"#$"%#%!# #

    '"32" !#/0 !& 45.2# !&& 25&)# !&$#!"#% 4.3%) &( -&0*5#) !".!.%# !&*2"#0 !&$#!"#% (%#C*#/!5, &/5, +#2.*)# !"#, .%#!&*2"#0 5.%$# /*-+#%) &( !3-#) 3/ !&!.5 ./0

    ""!#!!# $ !"#$"%#%!# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    !"#$"%# % !"#$"%!# "

    ## "/#

    '"32" 2./ +# 3/!#%4%#!#0 .) . 2&%%#5.!3&/ A3! 3) 03-#/)3&/E5#)) ./0 53#) +#!'##/ 9#%& ./0 &/#B@ +*! '"32" 2./#U.$$#%.!# %#5.!3&/)"34) +#!'##/ -&0*5#) !".! .%# %.%#5,!&*2"#0 .! .556

    V/# )"&%!2&-3/$ 3) !".! !"# '#3$"!) &( AQB A*/538#!"&)# &( APBB .%# /&! 3/1.%3./! '3!" /*-+#%) &( 2"./$#)6R&'#1#%@ !"# (&55&'3/$ 03)4#%)3&/ ./.5,)3) 4%&130#)

    (*%!"#% #130#/2# &( !"# 0#253/# &( -&0*5.%3!,6 M/ NOPP@

    !"# -#./ )C*.%# 03)!./2# +#!'##/ 4&3/!) 3/ !"# )-.55@#5#1#/E4&3/! 25*)!#% ./0 3!) 2#/!%&30 3) 0'122@ '"35# !"#.1#%.$# 03)!./2# +#!'##/ 4&3/!) 3/ !"# 5.%$#%@ =TE4&3/!25*)!#% ./0 3!) 2#/!%&30 3) 0'2346 7"# 3/!#%25*)!#% 03)!./2#@

    &% !"# 03)!./2# +#!'##/ !"# 2#/!%&30) &( !"# !'& 25*)!#%)@3) 3'(/6 W/ 3/!*3!31#5, .44#.53/$ -#.)*%# &( 03)!./2#+#!'##/ 25*)!#%)@ !"#/@ 3) 3'(/(

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!0'122% 0'234

    )$ 4''16 7"#

    !"#$ !% &'() *+!, #+*! *!#&-. &,,!,,"/0 %1! !2"*!/#! 34+5 #1&/0! 5&/&0!5!/% *&%& 6

    378( 9( %:;) /7?:@AB C7>D :E F=> G:HIJ>B 7K :K> BILBMBF>G IB7K8 HNFN F=@:I8= O6PP F: ;JN G:HIJ>B D=7 L>>K HNF F=> BNG> F7G> F: :K> NK:F=>@( %D: @B :E G:HIJ>B N@>>C7H>KFQ N G:HIJ> D7F=7K :K> :E F=>B> @B 7B :EF>K H F:8>F=>@D7F= :F=>@ G:HIJ>B 7K F=> @ LIF K:F D7F= :F=>@ G:HIJ>B( #>KF>@)/7?:@AB C7>D :E F=> G:HIJ>B 7K F=> F:; J>EFR F=7B F7G> 7K =7BF:@M F=@:I8= O6P6( %=> @B F=NF N;;>N@ 7K F=> F:; J>EFC7>D N@> @87K8 7K :K >N@( %=7B BI88>BFB F=NF F=> N@F=NF DNB ;@>C7:IBJM BI;N@NF7K8 F=> EIK FD:@B :E G:HIJ>B 7B L@>NA7K8 H:DK( S:FF:G) %=> L@>NAH:DKH NKH NF F=> >KH :E O66T F=>@> DNB K: BI88>BF7:K :E GIJF7;J>@B :E G:HIJ>B(

    !"# $%&'!" &( !"# )*+),)!#- ./0 #1#/ 2"./$#) &( '30#%)4./ /##0 /&! 2%&)) -&0*5# +&*/0.%3#)6 7"# /#!'&%813)*.539.!3&/ !&&5 :32"#;&%8) ?@ '"32" "#54) 03)45.,)!%*2!*%# A3/25*03/$ 25*)!#%)B 3/ /#!'&%8)@ .55&') *) !&.00%#)) !"# C*#)!3&/ &( '"#!"#% -&0*5.%3!, 3) +%#.83/$0&'/ &1#% !3-#@ ./0 5#.0) !& !"# %#)*5!) 3/ D3$6 >@ '"32")*$$#)! )!%&/$5, !".! 3! 3)6

    7"# "#.0 &( #.2" !.04&5#E538# )".4# 3/ !"# *44#% 5#(!4./#5 &( D3$6 > 2&%%#)4&/0) !& . -&0*5#F !"# 4&)3!3&/) &( !"#-&0*5#) ".1# +##/ 2"&)#/ +, :32"#;&%8) 3/ . -.//#% !".!45.2#) 4.3%) &( -&0*5#) /#.%+, 3( !"#, ".1# +##/ 2"./$#0!&$#!"#% .) 4.%! &( !"# ).-# GH) . 5.%$# /*-+#% &( !3-#)6G&%# 4%#23)#5,@ !"# '#3$"!) .%# 0#(3/#0 3/ !#%-) &( !"#I/*-+#% &( 2"./$#)J KLM &( ANB@ '3!" !".! (&% -&0*5#) !./0 !! +#3/$

    ""!#!!# $ !"#$"%#%!#

    '!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$"%# % !"#$"%!#

    " # "(#

    '"#%# $ 3) ./ 3/!#%1.5 &( !3-# A)## +#5&'B ./0 '"#%#

    !"#$"%#%!# $#

    % $& !#% $& !!)&*+,-".# ' &(

    3) !"# /*-+#% &( GH) !&*2"3/$ +&!" ! ./0!!6 M/ !"# *44#%5#(! 4./#5@ !"# "#.0) )"&' !"3) /#!'&%8 13#' *)3/$ .55 !"#2"./$# 0.!. !"%&*$" !"# #/0 &( 2.5#/0.% ,#.% NOPPA2&%%#)4&/03/$ !& &/# 2"&32# &( $B@ '"35# !"# !.35) &( !"#)#$-#/!) 03)45., !"# ).-# 13#' .! !"# #/0 &( NOPQ A./#.%53#% 2"&32# &( $B6

    M/ !"3) '.,@ &/# 2./ )## "&' %#5.!3&/)"34) .-&/$-&0*5#) #1&51#0 !"%&*$" !3-#6 M/ !"# !&4 4./#5 &( D3$6 >@!"#%# .%# !'& -.3/ 25*)!#%) &( %&*$"5, . 0&9#/ -&0*5#)#.2"6 R&'#1#%@ 3/ !"# 2#/!#% 4./#5@ '"32" 03)45.,) !"#2"./$# 0.!. !"%&*$" !"# #/0 &( NOPO 3/ !"# 5&2.!3&/) &( !"#"#.0) A!"# NOPP 0.!. .44#.% "#%# .) !"# 5&2.!3&/) &( !"#)#$-#/!)S #/04&3/!)B@ !"#)# !'& 25*)!#%) ".1# -&)!5,-#%$#06 7"# -#%$3/$ 4%&2#)) 2&/!3/*#0 ./0 .! !"# #/0 &(NOOT !"# 25*)!#%) .%# /& 5&/$#% 13)3+5#6 ;"35# !"# 5&$32 &( !"#2&0# '.) &%3$3/.55, 3/!#/0#0 !& )## !& 3! !".! )&-# -&0*5#)'&*50 +# #))#/!3.55, 3/0#4#/0#/! &( #.2" &!"#%@ /#' ./0*/./!3234.!#0 (*/2!3&/.53!, -., ".1# "#54#0 !& 0#)!%&, !"3)3/0#4#/0#/2#6

    7"# '#3$"!) 3/ AQB 2&/)!3!*!# . 2&-4%&-3)# +#!'##/)3-45# 2&*/!)

    ""!#!!# $ !"#$"%#%!# #

    '"32" !#/0 !& 45.2# !&& 25&)# !&$#!"#% 4.3%) &( -&0*5#) !".!.%# !&*2"#0 !&$#!"#% (%#C*#/!5, &/5, +#2.*)# !"#, .%#!&*2"#0 5.%$# /*-+#%) &( !3-#) 3/ !&!.5 ./0

    ""!#!!# $ !"#$"%#%!# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    !"#$"%# % !"#$"%!# "

    ## "/#

    '"32" 2./ +# 3/!#%4%#!#0 .) . 2&%%#5.!3&/ A3! 3) 03-#/)3&/E5#)) ./0 53#) +#!'##/ 9#%& ./0 &/#B@ +*! '"32" 2./#U.$$#%.!# %#5.!3&/)"34) +#!'##/ -&0*5#) !".! .%# %.%#5,!&*2"#0 .! .556

    V/# )"&%!2&-3/$ 3) !".! !"# '#3$"!) &( AQB A*/538#!"&)# &( APBB .%# /&! 3/1.%3./! '3!" /*-+#%) &( 2"./$#)6R&'#1#%@ !"# (&55&'3/$ 03)4#%)3&/ ./.5,)3) 4%&130#)

    (*%!"#% #130#/2# &( !"# 0#253/# &( -&0*5.%3!,6 M/ NOPP@

    !"# -#./ )C*.%# 03)!./2# +#!'##/ 4&3/!) 3/ !"# )-.55@#5#1#/E4&3/! 25*)!#% ./0 3!) 2#/!%&30 3) 0'122@ '"35# !"#.1#%.$# 03)!./2# +#!'##/ 4&3/!) 3/ !"# 5.%$#%@ =TE4&3/!25*)!#% ./0 3!) 2#/!%&30 3) 0'2346 7"# 3/!#%25*)!#% 03)!./2#@

    &% !"# 03)!./2# +#!'##/ !"# 2#/!%&30) &( !"# !'& 25*)!#%)@3) 3'(/6 W/ 3/!*3!31#5, .44#.53/$ -#.)*%# &( 03)!./2#+#!'##/ 25*)!#%)@ !"#/@ 3) 3'(/(

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!0'122% 0'234

    )$ 4''16 7"#

    !"#$ !% &'() *+!, #+*! *!#&-. &,,!,,"/0 %1! !2"*!/#! 34+5 #1&/0! 5&/&0!5!/% *&%& 6

    378( 9( %:;) /7?:@AB C7>D :E F=> G:HIJ>B 7K :K> BILBMBF>G IB7K8 HNFN F=@:I8= O6PP F: ;JN G:HIJ>B D=7 L>>K HNF F=> BNG> F7G> F: :K> NK:F=>@( %D: @B :E G:HIJ>B N@>>C7H>KFQ N G:HIJ> D7F=7K :K> :E F=>B> @B 7B :EF>K H F:8>F=>@D7F= :F=>@ G:HIJ>B 7K F=> @ LIF K:F D7F= :F=>@ G:HIJ>B( #>KF>@)/7?:@AB C7>D :E F=> G:HIJ>B 7K F=> F:; J>EFR F=7B F7G> 7K =7BF:@M F=@:I8= O6P6( %=> @B F=NF N;;>N@ 7K F=> F:; J>EFC7>D N@> @87K8 7K :K >N@( %=7B BI88>BFB F=NF F=> N@F=NF DNB ;@>C7:IBJM BI;N@NF7K8 F=> EIK FD:@B :E G:HIJ>B 7B L@>NA7K8 H:DK( S:FF:G) %=> L@>NAH:DKH NKH NF F=> >KH :E O66T F=>@> DNB K: BI88>BF7:K :E GIJF7;J>@B :E G:HIJ>B(

    If modules have changed together as a part of the same MR, they were placed to close to each other.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Results: (3) Fault potential, the likelihood of changes to induce faults increases over

    time

    !"!#$%$&' (&!")*)+ ,$- ./0/ *' !!"#"!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!"$%! $!&!'

    "# "!((1

    2,)3- )43 #!-%3 536-3!'3 *" ./0/7 )4*' 83!'&-383")6$")*"&3' )$ '4-*"97 !#:3*) "$) !' -!;*5#+7 -3!64*"%!!&$"

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!$!""$! $!&%'

    "# "!#% *" .//36)*$" ?1.7 @3 *53")*,*35 (&!#*)+ !' $"3 $, )4-33 93+-3';$"'3' )$ 6$53 536!+1 A3-3 @3 '&88!-*B3 -3'3!-64#*"9*"% ,!)' *" )43 '$,)@!-3 )$ '+8;)$8' $, 6$53 536!+7&'*"% )43 ;-35*6)*C3 DE=' )*+,- $, FGH !"5 )*./0 $, FIH1J$-3 6$8;#3)3 5*'6&''*$" $, )4*' ,!) ;$)3")*!# 8$53#*"%!;;3!-' *" KLMN1 =" )4*' @$-97 )43 !&)4$-' 6$&")35 )43"&8:3-' $, ,!)' @4$'3 ,*O3' )$&6435 3!64 $, )43 8$5'*" ! '&:'+')38 $, )43 IP>>! 6$53 *" ! )@$ +3!- ;3-*$5 !"553C3#$;35 ')!)*')*6!# 8$53#' ,$- )43'3 ,!) 6$&")' &'*"%83!'&-383")' $" )43 8$5' )4!) @3-3 6!#6!:#3 :3,$-3)43 ')!-) $, )43 )@$ +3!- ;3-*$51Q43 )4-&') $, )43'3 8$53#' *' )$ ;-35*6) )43 5*')-*:&)*$"

    $, ,&)&-3 ,!)' $C3- 8$5' *" )43 '&:'+')38 ,-$8 )438$5'R 64!"%3 4*')$-+1 Q43 :3') 8$53#' ;-35*6)35"&8:3-' $, ,!)' &'*"% "&8:3-' $, 64!"%3' )$ )43 8$5*" )43 ;!')7 )43 5!)3' $, )43'3 64!"%3' F*1317 )43 "3%!)*C3 $,)43*- !%3'7 83!'&-35 *" +3!-'H7 !"5 )43*- '*B3'7 !' *" FGHS

    )*+,-$1% &"

    # #$ %&$!(#!-2,3$4%!

    567 2--$4'1% '-3/$4'1%( )'$'%

    @*)4 )43 ;!-!83)3- ( # !(# 53)3-8*"35 :+ ')!)*')*6!#!"!#+'*' F'33 KLMNH1 Q4&'7 #!-%37 -363") 64!"%3' !55 )43 8$'))$ ,!) ;$)3")*!# !"5 )43 "&8:3- $, )*83' ! 8$5 4!':33" 64!"%35 *' ! :3))3- ;-35*6)$- )4!" *)' '*B3 $, )43"&8:3- $, ,!)' *) @*## '&,,3- *" )43 ,&)&-31 Q4!) ( *# $ *' )43;-*8!-+ F!"5 5*-36)H 3C*53"63 )4!) 64!"%3' *"5&63 ,!)'ST43-3 ( # $7 ;!') 64!"%3' $, )43 '!83 '*B3 @$ :3*"5*')*"%&*'4!:#3 ,-$8 $"3 !"$)43- !"57 43"637 "$"3 6$:3 ;$'*)35 )$ 4!C3 !"+ ';36*,*6 3,,36)1Q43 8$53# F/H 5$3' ;-$C*53 3C*53"63 )4!) '$83 8$5'

    !-3 8$-3 536!+35 )4!" $)43-'1 =" ;-*"6*;#37 )4*' *''&3 6$:3 !55-3''35 :+ !##$@*"% ( )$ :3 8$5 53;3"53")7 :&)@3 4!C3 "$) +3) 5$"3 )4*'12" !#)3-"!)*C3 F!"5 #3'' ;$@3-,H 8$53#7 &'*"% )43 DE=

    $, FIH !"5 )43 '!83 5!)! !' F/H7 ! %3"3-!#*B35 #*"3!- 8$53#7 *'

    )*./0$1% # !$!(!"

    #

    !+# #$ %, ! !%&2.3$1%! $!$%

    Q4*' 8$53# *8;#*3' )4!) 6$53 4!C*"% 8!"+ #*"3' )4!) 4!C3'&-C*C35 ,$- ! #$"% )*83 *' #*93#+ )$ :3 -3#!)*C3#+ ,-33 $,,!)'1 J$-3 ;-36*'3#+7 !66$-5*"% )$ F.MH7 6$53 ! +3!- $#53-)4!" $)43-@*'3 '*8*#!- 6$53 )3"5' )$ 4!C3 $"#+ )@$U)4*-5'!' 8!"+ ,!)'1V"3 @!+ )$ 3C!#&!)3 )43'3 8$53#' *' :+ 6$8;!-*'$"

    @*)4 ! W"!*C3X 8$53# )4!) ;-35*6)' )43 "&8:3- $, ,&)&-3,!)' *" %*C3" #$6!)*$"' )$ :3 ;-$;$-)*$"!# )$ )43 "&8:3-$, ;!') ,!)'1 2' 5*'6&''35 *" KLMN7 *" '$83 6!'3'7 F.MH *'$"#+ 8!-%*"!##+ '&;3-*$- )$ )43 "!*C3 8$53# F!'83!'&-35 :+ ! Y$*''$" 53C*!"63H1 Z3C3-)43#3''7 )4*' ')*##83!"' )4!) ! 8$53# '&%%3')*"% 6!&'!#*)+ F53#)!' 6!&'3,!)'H 4!' )43 '!83 3O;#!"!)$-+ ;$@3- !' ! 8$53#;$'*)*"% '*8;#+ )4!) )43 5*')-*:&)*$" $, ,!)' $C3-8$5' *' ')!)*$"!-+ $C3- )*831

    >*8!)*$"' $, 53C*!"63' ;-$C*53 ')-$"% 3C*53"63 )4!))43 8$53# F/H *' '&;3-*$- )$ )4!) $, F.MH1 =" ;!-)*6!-7 )4*'83!"' )4!) )-3!)*"% 64!"%3' *"5*C*5&!##+ *8;-$C3' )43;-35*6)*$"'1P(&!##+ *8;$-)!") *' )4!) $)43- C!-*!:#3' 5*5 "$)

    *8;-$C3 )43 ;-35*6)*$"'7 $"63 '*B3 !"5 )*83 $, 64!"%3'!-3 )!93" *")$ !66$&")1 =" ;!-)*6!-7 ;-35*6)*$"' 5$ "$)*8;-$C3 :+ *"6#&5*"% 3*)43- 8$5 '*B3 $- $)43-83!'&-3' F83)-*6'H $, '$,)@!-3 6$8;#3O*)+ F@4*64 *" $&-5!)! !-3 6$--3#!)35 3''3")*!##+ 6$8;#3)3#+ @*)4 '*B3H1Q4&'7 64!"%3' )$ 6$53 !-3 8$-3 -3';$"'*:#3 ,$- ,!)')4!" )43 6$8;#3O*)+ $, )43 6$531J$-3$C3-7 )43 "&8:3- $, 53C3#$;3-' )$&64*"% ! 8$5

    4!5 "$ 3,,36) $" *)' ,!) ;$)3")*!#1.L V"3 ;$''*:#33O;#!"!)*$" *' )4!) ')-$"% $-%!"*B!)*$" ;-$%-!88*"%')!"5!-5' !))3"&!)3 !"+ '&64 3,,36)'1 Q43 64!"%3 ,-$8 6$53$@"3-'4*; )$ 64!"%3 $@"3-'4*; FK.36)*$" L1.7 ,3!)&-3' !-3 )43 &"*)' $, '+')38,&"6)*$"!#*)+ F31%17 6!## @!*)*"%H :+ @4*64 )43 '+')38 *'3O)3"535 !"5 !-3 )$$ !%%-3%!)35 ,$- 8$') ;&-;$'3'1A$@3C3-7 3,,$-) 5!)! F;3-'$" 4$&-'H !-3 !C!*#!:#3 $"#+ !))4*' #3C3#1 F[&-)43- !"!#+'*' $, ,!6)$-' !,,36)*"% 3,,$-)7:!'35 $" #!-%3- 5!)! '3)' :&) -3(&*-*"% )43 *8;&)!)*$" $,3,,$-) ,$- *"5*C*5&!# 64!"%3' %*C3" !%%-3%!)35 3,,$-)C!#&3'7 *' *" KLIN1HPO)-383 C!-*!:*#*)+ $, )43 ,3!)&-3U#3C3# 5!)! "363''*)!)35

    )!9*"% #$%!-*)48' $, !## C!-*!:#3'1 FQ43 !6)&!# )-!"',$-8!U)*$"7 567(!' -)7 !C$*5' "3%!)*C3 "&8:3-'1H Q43 -3')!")8$53# *'

    567$!' 3))$4%% # !"8' !!" $567(!' )9/3:$4%)%8

    . !$'$567(!'-3/$4%)%8

    ' !!8 567(!'2--$4%) 567(!'-3/$4%)' !!! 567(!' 9;,$4%). !&( 567(!'-3/,2:$4%)!

    $!!%

    2## 6$3,,*6*3")' '4$@" !-3 ')!)*')*6!##+ '*%"*,*6!")#+ 5*,,3-3"),-$8 B3-$] )43 8)*;#3 )8 C!#&3 *' !"36)*$" I1L 8!+ "$) :3 4!-8, :&) '*"63 )43 '*B3 $, 64!"%3'*' 6$--3#!)35 @*)4 )43*- ';!"7 *) *' 8$-3 #*93#+ )4!) @3 !-3'*8;#+ '33*"% )43 '*B3 C!-*!:#3 8!'9 )43 3,,36) $, ';!"1

    !"3 4+'&12 -+% 5--+%*

    A3-37 @3 !''3'' )43 3C*53"63 ,$- W:$))$8 #*"3X -3#3C!"63 $,6$53 536!+S D!" )43 3,,$-) -3(&*-35 )$ *8;#383") 64!"%3':3 ;-35*6)35 ,-$8 '+8;)$8' !"5 -*'9 ,!6)$-' ,$- 536!+\ Q43!"!#+'*' 38;#$+' ! C!-*!") $, )43 ;-35*6)*C3 DE= P[[ $, F36)*$" L1.7 ,3!)&-3' !-3 )43 &"*)' $, '+')38,&"6)*$"!#*)+ F31%17 6!## @!*)*"%H :+ @4*64 )43 '+')38 *'3O)3"535 !"5 !-3 )$$ !%%-3%!)35 ,$- 8$') ;&-;$'3'1A$@3C3-7 3,,$-) 5!)! F;3-'$" 4$&-'H !-3 !C!*#!:#3 $"#+ !))4*' #3C3#1 F[&-)43- !"!#+'*' $, ,!6)$-' !,,36)*"% 3,,$-)7:!'35 $" #!-%3- 5!)! '3)' :&) -3(&*-*"% )43 *8;&)!)*$" $,3,,$-) ,$- *"5*C*5&!# 64!"%3' %*C3" !%%-3%!)35 3,,$-)C!#&3'7 *' *" KLIN1HPO)-383 C!-*!:*#*)+ $, )43 ,3!)&-3U#3C3# 5!)! "363''*)!)35

    )!9*"% #$%!-*)48' $, !## C!-*!:#3'1 FQ43 !6)&!# )-!"',$-8!U)*$"7 567(!' -)7 !C$*5' "3%!)*C3 "&8:3-'1H Q43 -3')!")8$53# *'

    567$!' 3))$4%% # !"8' !!" $567(!' )9/3:$4%)%8

    . !$'$567(!'-3/$4%)%8

    ' !!8 567(!'2--$4%) 567(!'-3/$4%)' !!! 567(!' 9;,$4%). !&( 567(!'-3/,2:$4%)!

    $!!%

    2## 6$3,,*6*3")' '4$@" !-3 ')!)*')*6!##+ '*%"*,*6!")#+ 5*,,3-3"),-$8 B3-$] )43 8)*;#3 )8 C!#&3 *' !"=

    !"#"$ %&'(

    ?$% &0@% 1< 6 *1,';% ! $6& 6;5%6,) (%%" ,%S ': #1 #0*% * 6", )*$ 0&,0&/'&&%, 0" .%/#01" J=K=J 6", +$%5% )4 +4 $04 6", $7 65%%*6#%, '&0"A #0/6; 6"6;)&0&=

    T1#$ 1< #$%&% 0",0/%& 0;;'*0"6#% /$6"A% 6& #$% :50*65)6A%"# /5%6#0"A S ': #1 #0*% * 6", )*$ 0&,0&/'&&%, 0" .%/#01" J=K=J 6", +$%5% )4 +4 $04 6", $7 65%%*6#%, '&0"A #0/6; 6"6;)&0&=

    T1#$ 1< #$%&% 0",0/%& 0;;'*0"6#% /$6"A% 6& #$% :50*65)6A%"# /5%6#0"A

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Results: (4) Prediction of efforts increases over time

    File span has positive correlation. Large deletions are implemented rather easily.

    Hardest changes require both additions and deletions.Large number of editing changes are rather easy to implement.

    !"!#$%$&' (&!")*)+ ,$- ./0/ *' !!"#"!!!!!!!!!!!!!!!!!!!!!!!!!!!!$!"$%! $!&!'

    "# "!((1

    2,)3- )43 #!-%3 536-3!'3 *" ./0/7 )4*' 83!'&-383")6$")*"&3' )$ '4-*"97 !#:3*) "$) !' -!;*5#+7 -3!64*"%!!&$"

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!$!""$! $!&%'

    "# "!#% *" .//36)*$" ?1.7 @3 *53")*,*35 (&!#*)+ !' $"3 $, )4-33 93+-3';$"'3' )$ 6$53 536!+1 A3-3 @3 '&88!-*B3 -3'3!-64#*"9*"% ,!)' *" )43 '$,)@!-3 )$ '+8;)$8' $, 6$53 536!+7&'*"% )43 ;-35*6)*C3 DE=' )*+,- $, FGH !"5 )*./0 $, FIH1J$-3 6$8;#3)3 5*'6&''*$" $, )4*' ,!) ;$)3")*!# 8$53#*"%!;;3!-' *" KLMN1 =" )4*' @$-97 )43 !&)4$-' 6$&")35 )43"&8:3-' $, ,!)' @4$'3 ,*O3' )$&6435 3!64 $, )43 8$5'*" ! '&:'+')38 $, )43 IP>>! 6$53 *" ! )@$ +3!- ;3-*$5 !"553C3#$;35 ')!)*')*6!# 8$53#' ,$- )43'3 ,!) 6$&")' &'*"%83!'&-383")' $" )43 8$5' )4!) @3-3 6!#6!:#3 :3,$-3)43 ')!-) $, )43 )@$ +3!- ;3-*$51Q43 )4-&') $, )43'3 8$53#' *' )$ ;-35*6) )43 5*')-*:&)*$"

    $, ,&)&-3 ,!)' $C3- 8$5' *" )43 '&:'+')38 ,-$8 )438$5'R 64!"%3 4*')$-+1 Q43 :3') 8$53#' ;-35*6)35"&8:3-' $, ,!)' &'*"% "&8:3-' $, 64!"%3' )$ )43 8$5*" )43 ;!')7 )43 5!)3' $, )43'3 64!"%3' F*1317 )43 "3%!)*C3 $,)43*- !%3'7 83!'&-35 *" +3!-'H7 !"5 )43*- '*B3'7 !' *" FGHS

    )*+,-$1% &"

    # #$ %&$!(#!-2,3$4%!

    567 2--$4'1% '-3/$4'1%( )'$'%

    @*)4 )43 ;!-!83)3- ( # !(# 53)3-8*"35 :+ ')!)*')*6!#!"!#+'*' F'33 KLMNH1 Q4&'7 #!-%37 -363") 64!"%3' !55 )43 8$'))$ ,!) ;$)3")*!# !"5 )43 "&8:3- $, )*83' ! 8$5 4!':33" 64!"%35 *' ! :3))3- ;-35*6)$- )4!" *)' '*B3 $, )43"&8:3- $, ,!)' *) @*## '&,,3- *" )43 ,&)&-31 Q4!) ( *# $ *' )43;-*8!-+ F!"5 5*-36)H 3C*53"63 )4!) 64!"%3' *"5&63 ,!)'ST43-3 ( # $7 ;!') 64!"%3' $, )43 '!83 '*B3 @$ :3*"5*')*"%&*'4!:#3 ,-$8 $"3 !"$)43- !"57 43"637 "$"3 6$:3 ;$'*)35 )$ 4!C3 !"+ ';36*,*6 3,,36)1Q43 8$53# F/H 5$3' ;-$C*53 3C*53"63 )4!) '$83 8$5'

    !-3 8$-3 536!+35 )4!" $)43-'1 =" ;-*"6*;#37 )4*' *''&3 6$:3 !55-3''35 :+ !##$@*"% ( )$ :3 8$5 53;3"53")7 :&)@3 4!C3 "$) +3) 5$"3 )4*'12" !#)3-"!)*C3 F!"5 #3'' ;$@3-,H 8$53#7 &'*"% )43 DE=

    $, FIH !"5 )43 '!83 5!)! !' F/H7 ! %3"3-!#*B35 #*"3!- 8$53#7 *'

    )*./0$1% # !$!(!"

    #

    !+# #$ %, ! !%&2.3$1%! $!$%

    Q4*' 8$53# *8;#*3' )4!) 6$53 4!C*"% 8!"+ #*"3' )4!) 4!C3'&-C*C35 ,$- ! #$"% )*83 *' #*93#+ )$ :3 -3#!)*C3#+ ,-33 $,,!)'1 J$-3 ;-36*'3#+7 !66$-5*"% )$ F.MH7 6$53 ! +3!- $#53-)4!" $)43-@*'3 '*8*#!- 6$53 )3"5' )$ 4!C3 $"#+ )@$U)4*-5'!' 8!"+ ,!)'1V"3 @!+ )$ 3C!#&!)3 )43'3 8$53#' *' :+ 6$8;!-*'$"

    @*)4 ! W"!*C3X 8$53# )4!) ;-35*6)' )43 "&8:3- $, ,&)&-3,!)' *" %*C3" #$6!)*$"' )$ :3 ;-$;$-)*$"!# )$ )43 "&8:3-$, ;!') ,!)'1 2' 5*'6&''35 *" KLMN7 *" '$83 6!'3'7 F.MH *'$"#+ 8!-%*"!##+ '&;3-*$- )$ )43 "!*C3 8$53# F!'83!'&-35 :+ ! Y$*''$" 53C*!"63H1 Z3C3-)43#3''7 )4*' ')*##83!"' )4!) ! 8$53# '&%%3')*"% 6!&'!#*)+ F53#)!' 6!&'3,!)'H 4!' )43 '!83 3O;#!"!)$-+ ;$@3- !' ! 8$53#;$'*)*"% '*8;#+ )4!) )43 5*')-*:&)*$" $, ,!)' $C3-8$5' *' ')!)*$"!-+ $C3- )*831

    >*8!)*$"' $, 53C*!"63' ;-$C*53 ')-$"% 3C*53"63 )4!))43 8$53# F/H *' '&;3-*$- )$ )4!) $, F.MH1 =" ;!-)*6!-7 )4*'83!"' )4!) )-3!)*"% 64!"%3' *"5*C*5&!##+ *8;-$C3' )43;-35*6)*$"'1P(&!##+ *8;$-)!") *' )4!) $)43- C!-*!:#3' 5*5 "$)

    *8;-$C3 )43 ;-35*6)*$"'7 $"63 '*B3 !"5 )*83 $, 64!"%3'!-3 )!93" *")$ !66$&")1 =" ;!-)*6!-7 ;-35*6)*$"' 5$ "$)*8;-$C3 :+ *"6#&5*"% 3*)43- 8$5 '*B3 $- $)43-83!'&-3' F83)-*6'H $, '$,)@!-3 6$8;#3O*)+ F@4*64 *" $&-5!)! !-3 6$--3#!)35 3''3")*!##+ 6$8;#3)3#+ @*)4 '*B3H1Q4&'7 64!"%3' )$ 6$53 !-3 8$-3 -3';$"'*:#3 ,$- ,!)')4!" )43 6$8;#3O*)+ $, )43 6$531J$-3$C3-7 )43 "&8:3- $, 53C3#$;3-' )$&64*"% ! 8$5

    4!5 "$ 3,,36) $" *)' ,!) ;$)3")*!#1.L V"3 ;$''*:#33O;#!"!)*$" *' )4!) ')-$"% $-%!"*B!)*$" ;-$%-!88*"%')!"5!-5' !))3"&!)3 !"+ '&64 3,,36)'1 Q43 64!"%3 ,-$8 6$53$@"3-'4*; )$ 64!"%3 $@"3-'4*; FK.36)*$" L1.7 ,3!)&-3' !-3 )43 &"*)' $, '+')38,&"6)*$"!#*)+ F31%17 6!## @!*)*"%H :+ @4*64 )43 '+')38 *'3O)3"535 !"5 !-3 )$$ !%%-3%!)35 ,$- 8$') ;&-;$'3'1A$@3C3-7 3,,$-) 5!)! F;3-'$" 4$&-'H !-3 !C!*#!:#3 $"#+ !))4*' #3C3#1 F[&-)43- !"!#+'*' $, ,!6)$-' !,,36)*"% 3,,$-)7:!'35 $" #!-%3- 5!)! '3)' :&) -3(&*-*"% )43 *8;&)!)*$" $,3,,$-) ,$- *"5*C*5&!# 64!"%3' %*C3" !%%-3%!)35 3,,$-)C!#&3'7 *' *" KLIN1HPO)-383 C!-*!:*#*)+ $, )43 ,3!)&-3U#3C3# 5!)! "363''*)!)35

    )!9*"% #$%!-*)48' $, !## C!-*!:#3'1 FQ43 !6)&!# )-!"',$-8!U)*$"7 567(!' -)7 !C$*5' "3%!)*C3 "&8:3-'1H Q43 -3')!")8$53# *'

    567$!' 3))$4%% # !"8' !!" $567(!' )9/3:$4%)%8

    . !$'$567(!'-3/$4%)%8

    ' !!8 567(!'2--$4%) 567(!'-3/$4%)' !!! 567(!' 9;,$4%). !&( 567(!'-3/,2:$4%)!

    $!!%

    2## 6$3,,*6*3")' '4$@" !-3 ')!)*')*6!##+ '*%"*,*6!")#+ 5*,,3-3"),-$8 B3-$] )43 8)*;#3 )8 C!#&3 *' !", $,3/!&"! $, #+! "6&, (5 3+&,=!"- #+!,- $""4%6#(%$3 (5 '!3&4- &" '$"3.""!' $, 2!3#$(, J1O1O1S$=1 8 "+(G" #+ "6&, $" $,3/!&"$,= 5(/ #+! ".)"4"#!%

    .,'!/ "#.'41 H+!/!- G! '$"6*&4 #+! 3+&,3! #+ &,4 =$E!,#$%! &, PQ #(.3+!" %(/! #+&, (,! 5$*! )4 "%((#+$,= '&$, G+$3+ !&3+ 6($,# 3(//!"6(,'" #( &, PQ1 > 6($,#W" &;3((/'$,! $" #$%! /!6/!"!,#!' )4 #+! (6!,$,= '! &,' $#"';3((/'$,! $" (,! G+!, %(/! #+&, (,! 5$*! $" #(.3+!' &,'A!/( (#+!/G$"!1 H+/!! *(3&* *$,!&/ "%((#+" :"!!- !1=1- XOMY&,' XOOY 5(/ $,#/('.3#$(, &,' '$"3.""$(,10 &/! "+(G, $, #+!#(6 6*(#1 H+!"! "%((#+" &/! !""!,#$&**4 G!$=+#!' *(3&*&E!/&=!"- G+!/! #+! G!$=+#" +&E! & Z&.""$&, "+&6! &,' #+!G$'#+" (5 #+! G$,'(G" :$1!1- "#&,'&/' '!E$$(, (5 #+!G!$=+# 5.,3#$(,0 &/! ( # $%* :6./6*! 3./E!0- ( # %%. :%.*#$;3(*(/!' 3./E!0- &,' ( # 3%. :)*.! 3./E!01H+! 3!,#/&* 3./E!- ( # %%.- "+(G" &, $,$#$&* '(G,G&/'

    #/!,'- G+$3+ $" ,./&* )!3&."! %&,4 5$*!" &/! #(.3+!' )43(%%(, 3+&,=!" $, #+! $,$#$&* '!E!*(6%!,# 6+&"!-5(**(G!' )4 & "#!&'4 .6G&/' #/!,' "#&/#$,= $, M[[N1 H+$"*&"# #/!,' /!5*!3#" )/!&T'(G, $, #+! %('.*&/$#4 (5 #+!3('!- &" G! '$"3."" 5./#+!/ $, 2!3#$(, 71O1 H+ #+$" $" &".)"#&,#$&* $,3/!&"! 3(%!" 5/(% #+! 5&3# #+ E&*.!" (,#+! ';&I$" /!6/!"!,# 6/()&)$*$#$!" :*(3&* $, #$%!0 #+ &3+&,=! G$** #(.3+ %(/! #+&, (,! 5$*!- G+$3+ %(/! #+&,'(.)*!" 5/(% & *(G (5 *!"" #+&, O 6!/3!,# $, M[\[ #( %(/!#+&, 7 6!/3!,# $, M[[F1K, #+! &)"!,3! (5 %(/! '!#&$*!' &,&*4"$"- #+! /!".*#" $,

    #+! #(6 6*(# $, S$=1 8 '!6!,' (, #+! G$,'(G G$'#+ (1 H+!*&/=!/ G$,'(G G$'#+- ( # 3%.- "+(G" (,*4 #+! .6G&/'#/!,'- G+$*! #+! "%&**!/ G$,'(G G$'#+- ( # $%*- "+(G" & *(#(5 &''$#$(,&* "#/.3#./!- G+$3+ %&4 )! ]%$3/(#/!,'"^ (/ %&4$,"#!&' )! "6./$(." "&%6*$,= &/#$5&3#"1 _.# +(G 3&, G! )!"./!` S./#+!/%(/!- +(G '( G! T,(G #+! 5!./!" ()"!/E!'$, #+! ( # %%. "%((#+- G+$3+ 3(,#&$," #+! $%6(/#&,#*!""(,"- &/! /!&*`H+! )(##(% +&*5 (5 S$=1 8 $" & 1-2") ,+#-MM G+$3+

    &''/!""!" #+$" $"".!1 V&3+ *(3$(, 3(//!"6(,'" #( & '!&,' #( & G$,'(G G$'#+ ( &,' $" "+&'!' )*.! :/!'0 G+!,#+! "%((#+ #+ G$,'(G G$'#+ &,' '! $" "#$"#$3&**4"$=,$5$3&,#*4 $,3/!&"$,= :'!3/!&"$,=- /!"6!3#$E!*401 Q!=$(,"G+!/! #+!/! $" ,( "$=,$5$3&,# 3+&,=! &/! "+&'!' $, #+!$,#!/%!'$! 3(*(/ 6./6*!1H+! "%&**!"# G$,'(G G$'#+- ( # $%*- $, #+! #(6 6*(# $"

    /!6/!"!,#!' )4 #+! )(##(% G+$#! *$,! $, #+! *(G!/ 6*(# &,'$" "+&'!' 6./6*! $, #+! #(6 6*(#- "$,3! #+$" G$,'(G G$'#+ $""+&'!' 6./6*! &** '!" $, #+! 2$a!/ %&61 H+$" $"$,#!/6/!#!' &" ]G+!, #+! '& &/! "#.'$!' #+$" *!E!* (5/!"(*.#$(,- #+!/! &/! ,( "$=,$5$3&,# $,3/!&"!" (/ '!3/!&"!"-^$1!1- #+! G$==*!" $, #+! 3./E! &/! ,(# "#$"#$3&**4 "$=,$5$3&,#1H+! $,#!/%!'$! G$,'(G G$'#+- ( # %%.- /.," #+/(.=+

    )(#+ #+! /!' &,' )*.! "+&'!' /!=$(,"1 H+$" "&%! 3(*(/$,= $"."!' $, #+! 3./E! $, #+! #(6 6*(#- G+$3+ "+(G" #+ #+!"#/.3#./! $" "#$"#$3&**4 "$=,$5$3&,#1 K, 6&/#$3.*&/- #+!/! $" &,$%6(/#&,# '(G,G&/' #/!,' #+! )!=$,,$,= &,' .6G&/'#/!,' &5#!/ M[[N1

    !"#$ !% &'() *+!, #+*! *!#&-. &,,!,,"/0 %1! !2"*!/#! 34+5 #1&/0! 5&/&0!5!/% *&%& 6

    MM1 2$a!/- $#" 6/(6!/#$!"- &,' "(%! E&/$$(," &/! '$"3.""!' $, XO8Y1

  • Any unexpected results?

    • The number of developers does not have much impact

    • Complexity metrics did not play much roles compared to size and number of changes

    Expected results?

    •...

  • Any unexpected results?

    •Once size and time of changes are taken into account, other variables (e.g. # developers, complexity metrics, span of files) did not play much roles in predicting faults.

    •Large number of editing changes are rather easy to implement.

    • In the beginning of evolution, file span was relatively large.

    Expected results?

    •File span increases over time. •Size and time of changes have some positive correlation with fault potential •Modularity degrades over time.

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Threats to Validity?Limitations?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Four types of threats to validity

    • External Validity: Can we generalize to other situations?

    • Internal Validity: Assuming that there is a relationship in this study, is the relationship a causal one?

    • Construction Validity: Assuming that there is a causal relationship in this study, can we claim that the program reflected well our construct of the program and measure?

    • Conclusion Validity: Is there a relationship between the cause and effect?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Another way of looking at Validity

    Cause Construct

    Effect Construct

    Program

    Theory: what you think

    Observation: what you test

    cause effect constructs

    Observationsprogram-outcome

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Example: WWW => Student Learning

    WWW virtual classroom

    Understanding

    Second life instruction

    Theory: WWW virtual classroom improves student understanding of course materials

    Observation: Let one half of EE382V to use second-life virtual class room and let the other half come to regular lecture. Compare their test scores at the end.

    cause effect constructs

    Test scoreprogram-outcome

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    External Validity

    WWW virtual classroom

    Understanding

    Second life instruction

    Theory: WWW virtual classroom improves student understanding of course

    Observation: Let one half of EE382V to use www site and let the other half

    cause effect constructs

    Test scoreprogram-outcome

    External Validity: Does this study generalize to students of EE322c?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Internal Validity

    WWW virtual classroom

    Understanding

    Second life instruction

    Theory: WWW virtual classroom improves student understanding of course

    Observation: Let one half of EE382V to use www site and let the other half

    cause effect constructs

    Test scoreprogram-outcome

    Internal Validity: Assuming that students using WWW did better in their test, isn’t it because these students have more money (apparently they have

    computers & high-speed internet) and rich students have more experiences with objective tests (due to their parents sending them to

    prep-schools.)

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Construct Validity

    WWW virtual classroom

    Understanding

    Second life instruction

    Theory: WWW virtual classroom improves student understanding of course

    Observation: Let one half of EE382V to use www site and let the other half

    cause effect constructs

    Test scoreprogram-outcome

    Construct Validity: Is the operationalization method valid? Do objective test scores truly reflect students’ understanding of core concepts?

    Don’t students who are familiar with second life interface just test better?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Conclusion Validity

    WWW virtual classroom

    Understanding

    Second life instruction

    Theory: WWW virtual classroom improves student understanding of course

    Observation: Let one half of EE382V to use www site and let the other half

    cause effect constructs

    Test scoreprogram-outcome

    Conclusion Validity: Are the correlation between second-life virtual classroom use and test scores significant?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Cause Construct Effect Construct

    Program

    cause effect constructs

    Observations

    program-outcome

    1. Temporal Behavior of the Span of Code Changes

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    1. Temporal Behavior of the Span of Code Changes

    • External Validity

    • Internal Validity

    • Construct Validity

    • Conclusion Validity

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Cause Construct Effect Construct

    Program

    cause effect constructs

    Observations

    program-outcome

    2. Time Behavior of Modularity

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    2. Time Behavior of Modularity

    • External Validity

    • Internal Validity

    • Construct Validity

    • Conclusion Validity

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Cause Construct Effect Construct

    Program

    cause effect constructs

    Observations

    program-outcome

    3. Prediction of Faults

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    3. Prediction of Faults

    • External Validity

    • Internal Validity

    • Construct Validity

    • Conclusion Validity

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Cause Construct Effect Construct

    Program

    cause effect constructs

    Observations

    program-outcome

    4. Models of Effort

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    4. Models of Effort

    • External Validity

    • Internal Validity

    • Construct Validity

    • Conclusion Validity

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    My general thoughts on Code Decay Paper

    • Rich data set!!!

    • Scientific research method

    • Identification of hypotheses => identify key variables and measure them = > create statistical models => statistical regression

    • What do identified coefficients real mean?

    • Can programmers use any of these findings for daily development activities?

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Recap

    • Code decay can be mapped to specific measured / derived variables.

    • e.g., span of changes => file span, non-localized changes => changes that spans module boundaries

    • Early mining software repositories research in late 90s that is based on statistical regression analysis and visualization

    • These types of research require having good insights.

    • e.g., weighted time dampened model

    • Identified which factors do mater! => some surprising results that complexity metrics do not matter

  • EE382V Software Evolution: Spring 2009, Instructor Miryung Kim

    Limitations

    • Carefully designed model that may have over-fitted data?

    • Their model did not consider change types or change content

    • Their model cannot handle module specific information

    • Their results do not generalize to other systems because most changes in open source system does not map to a logical software change