MTAT.03.243 Software Engineering Management · Coupling Lower Cohesion Higher Cyclomatic Complexity...

MTAT.03.243 / Lecture 07 / © Dietmar Pfahl 2015

MTAT.03.243

Software Engineering Management

Lecture 07:

SPI & Measurement –

Part A

Dietmar Pfahl

email: [email protected] Spring 2015


Announcement – ATI Career Day

Friday


Announcement –

Industry Guest Lecture

Increasing the predictability of software delivery with lean processes by Marek Laasik (VP Engineering at Fortumo)

Monday 23 March


Project – Short Presentation on March 25

• Purpose:

– to present initial ideas about the improvement project you

intend to plan, and

– to get feedback regarding suitability

• Duration: 3-5 min / max. 3 slides

• Content:

– (1) Context of the proposed improvement project,

– (2) Issues to be addressed/resolved and corresponding

improvement goals,

– (3) Optional: Sketch of the process changes you suggest

to make in order to achieve the improvement goals


Structure of Lecture 7

• Motivation and Definitions (Measure, Measurement)

• Example Measures (Process, Product, Resource)

• Subjectve Measurement


Project Planning and Control

All activities aim at matching the current course of the sw project with the planned course of the SW Project


Types of Process Models

Processes

Engineering Processes Non-Engineering Processes

Product-Engineering

Processes Process-Engineering

Processes Business

Processes Social

Processes

Development

Processes

Maintenance

Processes

Project Mgmt

Processes

Quality Mgmt

Processes

Conf Mgmt

Processes Product Line

Processes

Improvement

Processes

Process Modeling

Processes

Measurement

Processes

Process Taxonomy

Software

Knowledge

Process

Models

Product

Models

Quality

Models

Life Cycle

Models

Engineering Process Models

Business Process Models

Social Process Models

Technical Process Models

Managerial Process Models

Process Engineering

Proc. Models

. . .

. . .

PROFES


SPI Planning and Control

All activities aim at matching the current course of the SPI project with the planned course of the SPI Project (SPIP)

SPIP Start

SPIP Planning SPIP Control

SPIP Steering

SPIP Enactment SPIP End

SPIP = SPI Project

Process Improvement

Model

Context


Measurement in PROFES


Definitions:

Measurement and Measure

Measurement:

• Measurement is the process through which values (e.g., numbers) are assigned to attributes of entities of the real world.

Measure:

• A measure is the result of the measurement process, so it is the assignment of a value to an entity with the goal of characterizing a specified attribute.

Source: Sandro Morasca, “Software Measurement”, in “Handbook of Software Engineering and Knowledge Engineering - Volume 1: Fundamentals” (refereed book), pp. 239 - 276, Knowledge Systems Institute, Skokie, IL, USA, 2001, ISBN: 981-02- 4973-X.

4 e *

3 d *

2 c *

1 b *

0 a *

A B

Scale & Unit

LOC (lines of code)

Size Measure

Entity: Program Attribute: Size


Software Measurement Challenge

• Measuring physical properties (attributes):

entity attribute unit* scale (type) value range*

Human Height cm ratio 178 (1, 300)

Human Temperature C° interval 37 (30, 45)

• Measuring non-physical properties (attributes):

entity attribute unit* scale (type) value range*

Human Intelligence/IQ index ordinal 135 [0, 200]

Program Modifiability ? ? ? ?

• Software properties are usually non-physical:

size, complexity, functionality, reliability, maturity, portability, flexibility, understandability, maintainability, correctness, testability, coupling, coherence, interoperability, …

‘unit’ and ‘range’ are sometimes used synonymously with ‘scale’


Measurement – What is meaningful?

Some statements:

1. ‘I am twice as tall as you!’

2. ‘In Madrid it’s twice as hot (on average) as in Tartu during summer!’

3. ‘Software X is more complex than software Y!’

4. ‘Software X is twice as complex as software Y!’

5. ‘On average, our software has complexity 3.45!’

6. ‘On average, our software has ‘high’ complexity!’

• Which statements are meaningful?

• What statistics (e.g., mode, median, mean) and what statistical tests could be applied (e.g., parametric vs. non-parametric)?


Measurement Scale Types


Measurement Scale Types – cont’d

The classification of scales has an important impact on their practical use, in particular on the statistical techniques and indices that can be used. Example: Indicator of central tendency of a distribution of values (“Location”).

Mode = most frequent value of distribution Median = the value such that not more than 50% of the values of the distribution are less than the median and not more than 50% of the values of the distribution are greater than the median


Scale Types and Meaningful Measurement

• Scales are defined through their admissible transformations

• Scales (and their admissible transformations) help us decide

– whether a statement involving measures is meaningful

– what type of statistical analyses we can apply

• Definition of Meaningfulness:

A statement S with measurement values (i.e., measures m1,

…, mn) is meaningful iff its truth or falsity value is invariant

under admissible transformations Tr.

iff: “if and only if”

Tr(S[m1, …, mn]) is true iff S[Tr(m1), …, Tr(mn)] is true


Meaningfulness of Measurement-Based Statements

Definition:

A statement involving measures is

meaningful, if its truth value remains

unchanged under any admissible

transformation of its scale type

Example:

“In Madrid, during summer, it’s on average

twice as hot as in Tartu” (measured on the

Celsius scale: e.g., 40 C vs. 20 C)

-> Meaningful?



Procedure to check for meaningfulness:

1. Apply the admissible transformation to

measures in a statement S and obtain a

transformed statement S’.

2. If S’ can be shown to be equivalent to S,

then the statement S is meaningful for the

scale associated with the admissible

transformation.



Example:

“In Madrid, during summer, it’s on average twice as

hot as in Tartu” (measured on the Celsius scale: e.g.,

40 C vs. 20 C) -> Meaningful?

---

Statement: TM = 2*TT

The Celsius scale is of type ’interval’ (m’=a*m + b, a>0)

To check:

TM’ = 2*TT’ (?) under assumption that TM = 2*TT is true

Proof:

(1) TM’ = a*TM+b = a*(2*TT)+b

(2) 2*TT’ = 2*(a*TT+b) = a*(2*TT)+2*b

We see: (1) = (2) only if b=0

-> easy to construct counter-example with b<>0

Thus: Statement is not meaningful


Example Interval Scales: Fahrenheit & Celsius


Meaningfulness – Example 1

• Is statement (1) on the right meaningful, if X is measured on a ratio scale?

(1)

(2)

2

21 xx

Ratio Scale

2

21 xaxa




• Apply any admissible transformation M’=aM (a>0) for ratio scales:

(1)

(2)

2

21 xx

Ratio Scale

2

)()( 21 xaxa




• Apply any admissible transformation M’=aM (a>0) for ratio scales:

• By arithmetic manipulation, (2) can always be made equivalent to Tr(1) using any admissible transformation Tr. Therefore, the first statement is meaningful for a ratio scale.

(1)

(2)

mxx

2

21

Ratio Scale

maxaxa

2

21



• Is statement (1) on the right meaningful, if X is measured on an interval scale?

(1)

mxx

2

21

Interval Scale



• Is statement (1) on the right meaningful, if X is measured on an interval scale?

• Apply any admissible transformation M’=aM+b (a>0) for interval scales:

• By arithmetic manipulation, (2) can always be made equivalent to Tr(1). Therefore, the first statement is meaningful for an interval scale.

(1)

(2)

mxx

2

21

Interval Scale

bmabxabxa

2

21



• Is statement (1) on the right meaningful, if X is measured on an ordinal scale?

• Apply an admissible transformation for ordinal scales, e.g., x’=x3:

• For any pair of measurements x1 and x2, there exists always one admissible transformation such that statement (2) is false when (1) is true. Therefore, statement (1) is not meaningful for an ordinal scale.

(1)

(2)

mxx

2

21

Ordinal Scale

3

213

3

2

3

1

22

xxm

xx


Meaningfulness – Geometric Mean

• The geometric mean of a data set [a1,

a2, ..., an] is given by

• On which scale type is the geometric

mean meaningful?

Scale Type ?





• Subjective Measurement


Measurable Entities in a SW Process (Model)

An entity can represent any of the following:

• Process/Activity: any activity (or set of

activities) related to software

development and/or maintenance (e.g.,

requirements analysis, design, testing) –

these can be defined at different levels

of granularity

• Product/Artifact: any artifact produced or

changed during software development

and/or maintenance (e.g., source code,

software design documents)

• Resources: people, time, money,

hardware or software needed to perform

the processes

Activity

Productout

Productin

Ressourcetool

Ressourcerole


Examples of Software Product Attributes

• Size

–Length, Complexity,

Functionality

• Modularity

• Cohesion

• Coupling

• Quality

• Value (Price)

• ...

• Quality (-> ISO 9126)

–Functionality

–Reliability

–Usability

–Efficiency

–Maintainability

–Portability


Product Measure – Ex. 1: ’Code Size’

• Entity

• Attribute

• Unit

• Scale Type

• Range

• Who collects/reports the data?

• When (how often) is the data collected?

• How is the data collected?

• Who is responsible for data validity?

• Code module

• Size (or better: Length)

• Netto Lines of Code (NLOC)

• Ratio

• (0, ∞)

• Developer

• Once, at end of week

• Using tool ’CoMeas’

• Project Manager


Product Measure – Ex. 2: ’Code Quality 1’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Code module (class file)

• Quality (or better: Correctness)

• Defects (Def)

• Ratio

• [0, ∞)

• Developer

• Continuously during unit testing

• Using defect reporting tool ’TRep’

• Project Manager


Product Measure – Ex. 3: ’Code Quality 2’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Code module (file)

• Quality (or better: Defect Density)

• Def / NLOC

• Ratio

• [0, ∞)

• Developer

• Continuously during unit testing

• Using tools ’TRep’ and ’CoMeas’

• Project Manager


Common OO Code Measures

Measure Desirable Value

Coupling Lower

Cohesion Higher

Cyclomatic Complexity Lower

Method Hiding Factor Higher

Attribute Hiding Factor Higher

Depth of Inheritance Tree Low (tradeoff)

Number of Children Low (tradeoff)

Weighted Methods Per Class Low (tradeoff)

Number of Classes Higher (with ident functionality)

Lines of Code (net and total; comment) Lower (with ident functionality)

Churn (new + changed LoC) Lower (with ident functionality)


Complexity – McCabe

Measure Desirable

Value

Description

Cyclomatic

Complexity

(CC)

Lower Defines the number of independent (simple) paths in a Control Flow

Graph (CFG).

Draw CFG, then calculate CC as follows:

CC = #(edges) – #(nodes) + 2

CC = #(decisions) + 1

CC = 5 + 1 = 6


Direct vs. Indirect (Derived) Measures

• Direct measure: a measure that directly characterizes an empirical

property and does not require the prior measurement of some other

property

• Indirect measure: uses one or more (direct or indirect) measures of one

or more attributes in order to measure, indirectly, another supposedly

related attribute.

– Requires first the measurement of two or more attributes,

– then it combines them using a mathematical model.

speed = distance / time [km/h] accuracy = ( |actual – estimate| / estimate ) * 100% [Percentage]

Is ’estimate’ a measure?


Indirect Measures

• Examples:

– Defect Density (DD)

– Reliability (Rel)

– Productivity (Prod)

• Scale type of an indirect measure M will generally be the

weakest of the scale types of the direct measures M1, …,

Mn


Indirect Measures

• Examples:

– DD = Quality 1 / Size [Unit: #Def/NLOC]

– Reliability = Quality 1 / Time [#Def/hour]

– Productivity 1 = Size / Time [NLOC/hour]

– Productivity 2 = Size / Effort [NLOC/person-hour]

– ...


Subjective – Objective

Quantitative – Qualitative

Subjective Objective

Qualitative (nominal, ordinal)

?

?

Quantitative (interval, ratio)

?

?




• Assume you measure 8 times the same attribute of the

same entity (A: size [LOC] – B: complexity [?])

1. A: 120 – A: 120 – A: 119 ---- B: 4 – B: 4 – B: high

2. A: 124 – A: 120 – A: 121 ---- B: 4 – B: 4 – B: high

3. A: 120 – A: 120 – A: 124 ---- B: 4 – B: 5 – B: very high

4. A: 120 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

5. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 3 – B: medium

6. A: 120 – A: 120 – A: 122 ---- B: 4 – B: 4 – B: high

7. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

8. A: 124 – A: 120 – A: 124 ---- B: 4 – B: 4 – B: high

Six different Measurement Series






1. A: 120 – A: 120 – A: 119 ---- B: 4 – B: 4 – B: high

2. A: 124 – A: 120 – A: 121 ---- B: 4 – B: 4 – B: high

3. A: 120 – A: 120 – A: 124 ---- B: 4 – B: 5 – B: very high

4. A: 120 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

5. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 3 – B: medium

6. A: 120 – A: 120 – A: 122 ---- B: 4 – B: 4 – B: high

7. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

8. A: 124 – A: 120 – A: 124 ---- B: 4 – B: 4 – B: high

Guess: Columns 1 to 5 are Quantitative BUT: Columns 4&5 Might be Labels (not Numbers)






1. A: 120 – A: 120 – A: 119 ---- B: 4 – B: 4 – B: high

2. A: 124 – A: 120 – A: 121 ---- B: 4 – B: 4 – B: high

3. A: 120 – A: 120 – A: 124 ---- B: 4 – B: 5 – B: very high

4. A: 120 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

5. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 3 – B: medium

6. A: 120 – A: 120 – A: 122 ---- B: 4 – B: 4 – B: high

7. A: 124 – A: 120 – A: 120 ---- B: 4 – B: 4 – B: high

8. A: 124 – A: 120 – A: 124 ---- B: 4 – B: 4 – B: high

Guess: Columns 2 and 4 are Objective BUT: What if Column 4 Had value ’high’?


Types and Uses of Measures

• Types of Measures

– Direct vs. Indirect

– Subjective vs. Objective

• Has to do with

measurement process

(human involvement)

(reliability)

– Qualitative vs.

Quantitative

• Has to do with scale type

• Uses of Measures

– Assessment vs.

Prediction

NB: Measurement for

prediction requires a

prediction model


Measurable Entities in a SW Process (Model)

An entity can represent any of the following:

• Process/Activity: any activity (or set of

activities) related to software

development and/or maintenance (e.g.,

requirements analysis, design, testing) –

these can be defined at different levels of

granularity

• Product/Artifact: any artifact produced or

changed during software development

and/or maintenance (e.g., source code,

software design documents)

• Resources: people, time, money,

hardware or software needed to perform

the processes

Activity

Productout

Productin

Ressourcetool

Ressourcerole


Examples of Software Process and Resource

Attributes that can be measured

• Process-related:

• Efficiency:

• How fast (time, duration), how much effort (effort, cost), how much

quantity/quality per time or effort unit (velocity, productivity)?

• Effectiveness:

• Do we get the results (quantity/quality) we want? – e.g., test coverage

• Capability: CMMI level

• Resource-related:

• People: Skill, knowledge, experience, learning, motivation, personality

• Organisation: Maturity

• Method/Technique/Tool: Effectiveness, efficiency, learnability, cost


Process Measure – Ex. 1: ’Acceptance Test Time’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Acceptance Test

• Time (or ’Duration’)

• Calendar Day

• Interval or Ratio

• [0, ∞)

• Customer XYZ

• At end of every test day

• Using reporting template ’RT’

• Product Owner


Process Measure – Ex. 2: ’Coding Effort’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Coding

• Effort

• Person-hour

• Ratio

• [0, ∞)

• Developer

• At end of every work day

• Using reporting template ’RE’

• Project Manager


‘Time’ versus ‘Effort’

Time:

• Entity: Some Activity (e.g., Test)

• Attribute: Time (or Duration)

• Unit: Year, Month, Week,

(Work) Day, Hour, Minute,

Second, ...

• Range: [0, ∞)

• Scale type: ratio

• Characterisation:

• Direct

• Quantitative

• Objective/Subjective ???

• Effort:

• Entity: Some Activity (e.g., Test)

• Attribute: Effort

• Unit: Person-Year, …, Person-

Day, Person-Hour, …

• Range: [0, ∞)

• Scale type: ratio

• Characterisation:

• Direct

• Quantitative

• Objective/Subjective ???


Effort vs. Time Trade-Off

Person Effort = 4 person-days (pd)

1 1 1 1

Person Effort = 4 pd

1 1 1 1

Person Effort = 4 pd

1

1 1

1

Day

Day

Day

What does it mean when I say:

• ”This task takes 4 days”

• ”This task needs 4 person-days”


Agile Measurement:

Sprint Burndown Chart – Example

Sprint

Backlog

(Task List)


Agile Measurement: Burn-Down & Burn-Up

Both can be used to calculate (average) team velocity = Story Points (or: Storys) per Team per Sprint


Agile Measurement: Velocity [Story Points / Sprint]

Solid agile teams have consistent velocity (+/- 20%) Fluctuations? -> Look to stabilize team / environment Velocity trending up/down? -> Look at technical debt handling (rework) and team dynamics ...

Story Point (or: Task)


Resource Measure – Ex. 1: ’Programming

Skill’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Team Member

• Programming Skill

• Programming Test Score (PTS)

• Ordinal

• [0, 1, 2, 3, 4, 5] – NB: Each

number needs explanation!

• Skill Test Agency

• Whenever a test is conducted

• Programming Test

• Certification Agency


Resource Measure – Ex. 2: ’Personality’

• Entity

• Attribute

• Unit

• Scale Type

• Range





• Team Member

• Personality

• Myer Briggs Type Indicator (MBTI)

• Nominal

• Set of 16 Types: ISTJ, ISFJ, INFJ,

..., ENTJ

• Test Agency

• Whenever a test is conducted

• MBTI Instrument

• Certification Agency


Objective vs. Subjective Measurement

• Objective Measurement

– Usually, the measurement

process can be automated

– (Almost) no random

measurement error, i.e., the

process is perfectly reliable


– Human involvement in the

measurement process

– If we repeat the

measurement of the same

object(s) several times, we

might not get exactly the

same measured value every

time, i.e., the measurement

process is not perfectly

reliable

Rule of Thumb:

• Subjective measures have proven to be useful – but if an objective measure is available, then it is (usually) preferable


Procedures for Subjective Measurement

• Subjective Measures usually entail a well-defined

Measurement Procedure that precisely

describes:

–How to collect the data (usually via

questionnaires on paper or online)

–How to conduct interviews

–How to review documents (software artifacts)

–In which order to assess the dimensions/items

of the data collection instrument, etc.

• Examples: ISO9000 Audit, CMMI/SPICE

Assessment, Function Points


Objective vs. Subjective Measurement

Examples:


– Classification of defects into severity classes

– Function Points (when counted manually)

– Software Process Assessments

• Objective Measurement

– Lines of Code

– Cyclomatic Complexity

– Memory Size

– Test Coverage


Basic Concepts in Subjective Measurement

• Construct: A conceptual object that cannot be

directly observed and therefore cannot be directly

measured (i.e., we estimate the quantity we are

interested in rather than directly measure it); for

example:

–User Satisfaction

–Competence of a Software Engineer

–Efficiency of a Process

–Maturity of an Organization

• Item: A subjective measurement scale that is used to

measure a construct

–A question on a questionnaire is an item

Construct

Item1

Itemn

.

.

.

Measurement

Instrument


Dimensionality of Constructs

• Constructs can be one-dimensional or multi-dimensional

• If a construct is multidimensional, then each dimension covers a

different and distinct aspect of the construct

–e.g., the different dimensions of customer satisfaction

Construct

Item1

Itemn

.

.

.

One-Dimensional


Likert Type Scales

• Evaluation-type

Example:

“Familiarity with and

comprehension of the

software development

environment”

Little

Unsatisfactory

Satisfactory

Excellent

• Frequency-type

Example:

“Customers provide

information to the

project team about the

requirements”

Never

Rarely

Occasionally

Most of the

time

• Agreement-type

Example:

“The tasks supported

by the software at the

customer site change

frequently”

Strongly Agree

Agree

Disagree

Strongly

Disagree


Semantic Differential Scale

• Items which include semantic opposites

• Example:

“Processing of change requests to existing systems or services:

the time that MIS staff takes until responding to change requests

received from users of existing computer-based information

systems or services.”

Slow □ □ □ □ □ □ □ Fast

Timely □ □ □ □ □ □ □ Untimely


Assigning numbers to scale responses

• Likert-Type Scales:

Strongly Agree -> 1

Agree -> 2

Disagree -> 3

Strongly Disagree -> 4

• Ordinal Scale

• But:

Often the distances between the four response categories are approximately (conceptually) equidistant and thus are treated like approximate interval scales.

• Semantic Differential Scale:

Slow □ □ □ □ □ □ □ Fast

1 2 3 4 5 6 7

• Ordinal scale, but again, often treated

as interval scales


Reliability versus Validity

Assume you measure several times the same attribute of an entity (say, complexity of a code module) and the centre point is the true (but unknown) value.

http://www.uni.edu/chfasoa/reliabilityandvalidity.htm


Reliability versus Validity

Assume you measure several times the same attribute of an entity (say, complexity of a code module) and the centre point is the true (but unknown) value.

Not reliable: too much random bias (noise) Not valid: too much systematic bias


Reliability Estimation Techniques – Classes

• Number of

administrations is

the number of times

that the same object is

measured (per

observer)

• Number of

instruments is the

number of different

but equivalent

instruments that would

need to be

administered

Number of Instruments

One Two

Number of

Administrations

(per Observer /

Rater)

One Inter-Rater

Internal

Consistency

Parallel Forms

(immediate)

Two Test-Retest Parallel Forms

(delayed)

http://www.socialresearchmethods.net/kb/reltypes.php

http://www.socialresearchmethods.net/kb/reltypes.php


Inter-Rater Agreement vs.

Internal Consistency

• Example

Book 1 Book 2

Book 4 Book 3

4 Books 2 Reviewers 1 Instrument – 4 Items

Quality

Readability

Weight

Suspense

Length

bad

little

good

much

short

light

long

heavy

1 2 3 4 5 R1

R2


Inter-Rater Agreement vs.

Internal Consistency

Example – Data Average Inter-Item Correlation

R1:

R2: Inter-rater Agreement (Readability):

R1: R: 2 – 4 – 2 – 4

R2: R: 3 – 3 – 2 – 4

Fleiss’ Kappa = 0.33

(fair agreement)

Book 1: Q: - R: 2 - S: 3 - L: 3 - W: 3

Book 2: Q: - R: 4 - S: 3 - L: 2 - W: 2

Book 3: Q: - R: 2 - S: 3 - L: 1 - W: 2

Book 4: Q: - R: 4 - S: 5 - L: 4 - W: 3

Book 1: Q: - R: 3 - S: 3 - L: 3 - W: 3

Book 2: Q: - R: 3 - S: 4 - L: 3 - W: 2

Book 3: Q: - R: 2 - S: 1 - L: 2 - W: 2

Book 4: Q: - R: 4 - S: 4 - L: 3 - W: 3

R: S: L: W: R: 1 S: 0.66 1 L: 0.51 0.64 1 W: 0.29 0.46 0.73 1

Avg = 0.55

Quality rating: 1-2-3-4 Book 1: 0 1 1 0 Book 2: 0 0 1 1 Book 3: 0 0 2 0 Book 4: 0 0 0 2


Next Lecture

• Topic: SPI & Measurement – Part B

• For you to do:

– Have a look at the PROFES Quick Reference and

Manual -> What does it say about measurement?

– Finish and submit Homework 2

• Deadline: March 16, 20:00 (sharp!)

– Prepare your short presentation (March 25)

• Submit slides (max 3) at the latest by March 24 (23:59)

MTAT.03.243 Software Engineering Management · Coupling Lower Cohesion Higher Cyclomatic Complexity...

Documents

Transcript of MTAT.03.243 Software Engineering Management · Coupling Lower Cohesion Higher Cyclomatic Complexity...