Cigdem gencel persistence in poor estimating in software engineering- whys and hows v04
Transcript of Cigdem gencel persistence in poor estimating in software engineering- whys and hows v04
Persistence in Poor Estimatingin Software Engineering:
Whys and Hows
Çiğdem Gencel, Assist. Prof.
Free University of Bolzano/Bozen (Italy)
Faculty of Computer Science
Oxford University, UK
11 June 2014
Agenda
Introduction
The Estimating Problem
WHY? Fundamental
Issues
HOW?
What are the Basic Needs?
Conclusions
Open Discussion
What is CS and SE?
Software Engineering: The application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software [IEEE Std 610.12-1990]
Computer Science: Study of information and computation, and of practical techniques for using machines to process information and perform computation
Subjective opinions
Objective truth
Why to measure?
We measure to understand, to predict, to control and to improve
What is Measurement?
Entity Attribute Measure (Metric)
10,000 Lines of CodeLengthIf A>B
then
begin
A -
B
end
else
begin
A +
B
end;
Code
“The process by which numbers and symbols are assigned to attributes of entities in the real world so as to describe them according to clearly defined rules.” - Fenton, 1991
1 Schalken, J, and van Vliet H. "Measuring where it matters: Determining starting points for metrics collection",
Journal of Systems and Software, 81, 5, May 2008, p. 603-615
Exploratory Cycle Confirmatory Cycle
The empirical investigation in software engineering consists of exploratory and confirmatory cycles that are iterative in nature1
Empirical Investigations in SE (I)
Folk Proverbs for Weather Forecast
UK
“Red sky at night, sailor's delight; Red sky at morning, sailors take warning”
ITALY
IT: “Rosso tramonto, bianco mattino”
EN: Red sunset, white morning
IT: “Rosso di mattina,
il mal tempo s'avvicina”
EN: Rosy in the morning,
bad weather is coming
Italian proverbs source: http://www.italyrevisited.org/photo/Folk_Sayings_on_Nature
Photo source: http://www.wikihow.com/Predict-the-Weather-Without-a-Forecast
The exploratory cycle usually starts with unstructured observations
Folk Proverbs for Weather Forecast
UK
“Circle around the moon, rain or snow soon”
ITALY
IT: Quannu la luna e pallita, chiovi; quannu e russa, fa ventu e quann'e chiara fa sirinita.
EN: When the moon is pale, it will rain; when it is reddish, it will be windy and when it is clear the weather will be pleasant
Italian proverbs source: http://www.italyrevisited.org/photo/Folk_Sayings_on_Nature
Photo source: http://www.wikihow.com/Predict-the-Weather-Without-a-Forecast
Preliminary insights lead to hypothesis generation and more structured observations
Operational measures are selected/defined to test the hypothesis in the confirmatory cycle
Exploratory Cycle Confirmatory Cycle
1 Schalken, J, and van Vliet H. "Measuring where it matters: Determining starting points for metrics collection",
Journal of Systems and Software, 81, 5, May 2008, p. 603-615
Empirical Investigations in SE (II)
Controlled Experiments Surveys
Case StudiesInterviews
Measurement is necessary for collecting evidence during empirical inquiries
A sundial on a church at North Lake Garda (Italy). As the sun moves across the sky, shadows change in direction
and length, so a sundial can measure the length of a day with respect to different times of the year
Various measures and measurement instruments were developed throughout the history
What is Estimation?
y = f(parameter1, parameter2, ….., parametern)
MEASUREMENT : NOW
E.g. Temperature, Pressure, etc.
ESTIMATION: FUTURE
E.g. Simple or sophisticated weather forecast models
History of Base Measures and Instruments for Weather Predictions
Humidity Measurement1400s - da Vinci: First primitive hygrometer
1664 - Folli: First practical hygrometer
1820 – Daniell: First dew point hygrometer using electrical resistance
Wind Measurement1450 – Alberti: first anemometer
1805 – Beaufort: Beaufort Scale to visually estimate wind speed
1846 – Robinson: First four-cup anemometer
Temperature Measurement1593 – Galileo: First water thermometer
1714 – Fahrenheit: Mercury thermometer with Fahrenheit scale
1743 – Celsius: Mercury Thermometer with Celsius scale
1848 – Kelvin: Kelvin Scale (with absolute zero as -273 C)
Pressure Measurement1644 – Torricelli: Torricelli tube
1843 – Vidie: Metallic barometer
“Measure what can be measured, and make measurable what cannot be measured.” - Galileo Galilei
Modern Weather Forecast Models
In other cases, we might need more accuracy and therefore, more sophisticated models
Significance of the Problem
Annual cost of failures and over-runs:
• US market (Standish) ~100 Billion US$
• European market ~100 Billion €
Study No. of Cost Over-runs/
Country Projects Write-offs
UK Public Sect. 105 £ 29B £ 9B (31%)
Mostly US 1471 $ 246B $ 66B (27%)
2 Symons, C., Gencel, C., From Requirements to Project Effort Estimates – Work in Progress (Still?)
REFSQ Annual Conference, Industry Track Keynote, Germany, 2013
Software industry records show that projects are often delivered late and/or over budget2
Three major shifts in SE
Shift 1: Agility
Shift 2: GSE
Shift 3: Scale
Shift towards agility in development, distribution of tasks across borders, and increase in scale created more challenges3
3 Gencel, C., Petersen, K., Opening presentation of the 1st Intern. Workshop on Estimations in the 21st Century
Software Engineering (EstSE21), The Agile Conference (XP 2014), Rome, Italy, 2014
An Example from UK (I)
Over 20 years ago there was a lot of interest in software metrics (Norman Fenton wrote his book, the Government adopted metrics, UKSMA started)
Then there was a lot of outsourcing to the big international software houses, who moved a lot of work off-shore to low-cost countries.
This had two consequences:
◦ there were big cost savings, so why bother to measure supplier performance
◦ the customers lost all their knowledge of measurement to the suppliers (with the staff that they passed over to the suppliers)
Source of Information: Charles Symons, President of the Common Software Measurement International
Consortium (COSMIC)
More recently, off-shore costs have risen so software development work is starting to come back to low-cost regions of the UK
Simultaneously there is more interest in Agile development
◦ Agile requires small cohesive teams, which is difficult to achieve when e.g. the team is spread over the US, the UK and, say India. So quality concerns have arisen
Currently, there are signs of growing interest in metrics again to be able to manage these situations.
An Example from UK (II)Source of Information: Charles Symons, President of the Common Software Measurement International
Consortium (COSMIC)
WHY Poor Estimations?
I. Lack of well-established taxonomies/categories
II. Ill-defined attributes / measures
III. Lack of standardization
Product categories
Building
Apartment
Low rise
Mid rise
High rise
Airport
Hospital
…
Bridge
Motorway
Highway
…
…
Parameters measured with different metrics
Site work (m2 of site area) Foundations and columns (m2) Conveying system (# of floor stops) …
Measurement of Engineering Products
Various parametric systems exist for different types of civil engineering projects
Types of Software Systems
In software engineering, there is no commonly agreed classificationof software types
ISO TR 14143-5 CHAR Method - Functional
Domain Types
Pure Data Handling System
Information System
Data Processing System
Controlling Information System
Controlling Data System
Complex Controlling Information System
Non-Specific (Complex) System
Simple Control System
Control System
Complex Control System
Data Driven Control System
Complex Data Driven Control System
Pure Calculation System
Controlling Calculation System
Scientific Information System
Scientific Controlling Data Processing System
ISO 12182 Software Types
(no corresponding type)
Management Information System (Business transaction processing), Decision Support
Word Processing, Geographic Information System
(no corresponding type)
Automated Teller Banking
Business (Business Enterprise)
Military Command and Control
Real Time: Embedded, Device Driver
(no corresponding type)
Real Time: Embedded, Avionics, Message router
E-mail, Emergency dispatch call/receipt, Oper.Syst.
Process Control (Control System)
Scientific, Standard math/Trig. Algorithms
Engineering
Self-learning (Expert or Artificial Intelligence), Statistical, Spreadsheet, Secure Systems, Actuarial
Safety Critical
Inconsistent Classifications in SE
Each software
benchmark
dataset has their
own attributes
Categories not
well-established
and not
orthogonal
Application Types in an Example Dataset
Customer billing/relationship management; Business;
Customer billing/relationship management; Document management; Trading;
Customer billing/relationship management; CRM;
Customer billing/relationship management; Document management; Trading;
Customer billing/relationship management; Financial transaction process/accounting; Online analysis and reporting; Trading; Workflow support & management; Process Control; Electronic Data Interchange;
Customer billing/relationship management; Logistic or supply planning & control;
Customer billing/relationship management; Other;
Customer billing/relationship management; Stock control & order processing.
4 Gencel, C,, Buglione, L, Abran, A., “Improvement Opportunities and Suggestions for Benchmarking”, Intern.
Workshop on Software Measurement and Mensura Joint Conference, 2009
Size of a building
In civil engineering, different size measures are defined to measure the size of buildings
◦ Floor area (length x width of the floor) (m2) & height(m)
◦ Volume of a building (length x width x height)
The selection depends on the needs of the engineers or managers!
How about Size of Software?
Information processing amount
It is common that companies use ‘one size fits all’ approach!
Measurement in Physical Sciences
bit?
Base Measure SI unit Symbol
length meter m
mass kilogram kg
time second s
electric current ampere Athermodynamic temperature kelvin K
amount of substance mole mol
luminous intensity candela cd
• 7 base units were defined to measure physical quantities and
• 22 measures defined in terms of the base quantities via a system of
quantity equations
Source: NIST website: http://physics.nist.gov/cuu/Units/units.html
The foundation for the System of Units (SI) was laid during the French Revolution (1799)
Measurement in Social Sciences
In social sciences, there are no standard units of measurement
The theory and practice of measurement is studied in psychometrics
Measurement in Computer Science
Factor Name Symbol Origin Derivation 210 kibi Ki kilobinary: (210)1 kilo: (103)1
220 mebi Mi megabinary: (210)2 mega: (103)2
230 gibi Gi gigabinary: (210)3 giga: (103)3
240 tebi Ti terabinary: (210)4 tera: (103)4
250 pebi Pi petabinary: (210)5 peta: (103)5
260 exbi Ei exabinary: (210)6 exa: (103)6
Source: NIST website: http://physics.nist.gov/cuu/Units/units.html
In 1998, ISO IEC approved prefixes for binary multiples for use in the fields of data processing and data transmission
Recent Attempts for Standardization:A Standard on Functional Size Measurement5
Part 1 (1998) : Functional Size Measurement Concepts
◦ IEEE Std. 14143-1 (2000) owned ISO/IEC 14143-1:1998
Part 2 (2002) : Conformity evaluation of software size measurement
methods to ISO/IEC 14143-1:1998
Part 3 (2003) : Verification of Functional Size Measurement Methods
Part 4 (2002) : FSM – Reference Model
Part 5 (2004) : Determination of Functional Domains for use with
Functional Size Measurement
Part 6 (2005) : Guide for Use of ISO 14143 Series and Related
International Standards
5 ISO/IEC 14143: Information Technology – Software Measurement – Functional Size Measurement
In 1998, ISO established a working group to define the base concepts of functional size measurement
Standardized Functional Size Measurement
(FSM) Methods
IFPUG Function Point Analysis (ISO/IEC 20926)
Mark II Function Point Analysis (ISO/IEC 20968)
NESMA FSM Method (ISO/IEC 24570)
COSMIC Function Points (ISO/IEC 19761)
FISMA FSM (ISO/IEC 29881)
Among five standardized FSM methods, only COSMIC was designed to measure ‘pure functional size’ whereas others actually are designed to estimate ‘relative effort’ 6
6 Gencel, C., Symons, C., “From performance measurement to project estimating using COSMIC
functional sizing”, in the Proc. of Software Measurement European Forum (SMEF), Rome, 2009
HOW to Improve?
These make it difficult to investigate relationships and rules among different attributes
Persistence of Poor Predictions!
Accurate Predictions
Well-Defined & Standard
measures / instruments
Good Categories/Tax
onomies