Experimental Software Engineering

185
Experimental Software Engineering Prof. Marcos Kalinowski [email protected]

Transcript of Experimental Software Engineering

Page 1: Experimental Software Engineering

Experimental Software Engineering

Prof. Marcos [email protected]

Page 2: Experimental Software Engineering

Introductions

Marcos Kalinowski• Software Engineering Professor at PUC-Rio• Member of the ISERN• Main research interests:

– Empirical Software Engineering– Software Quality Improvement

• Further information:– www.inf.puc-rio.br/~kalinowski

• Who are you?– Background, interests, ...

Marcos Kalinowski 2Experimental Software Engineering

Page 3: Experimental Software Engineering

• Discipline topics:– Experimental Software Engineering: Overview and Research Opportunities

– Empirical Strategies

– Measurement Concepts

– Systematic Literature Reviews and Mapping Studies

– Surveys

– Case Studies

– Controlled Experiments• Experiment Process: Scoping, Planning, Operation, Analysis and Interpretation,

Presentation and Package

– Design Science Research

– Qualitative Methods

– Theory Building

Marcos Kalinowski 3Experimental Software Engineering

Experimental Software Engineering

Page 4: Experimental Software Engineering

• Assessment– Evaluation 1 = Topic presentation and participation in classroom

discussions– Evaluation 2 = Secundary study plan– Evaluation 3 = Primary study plan– Evaluation 4 = Paper with 8 to 16 pages in Springer LNCS format

Grade = (Evaluation 1 + Evaluation 2 + Evaluation 3 + (2x Evaluation 4)) / 5

Success– (Presence >= 75%) AND (Grade >= 6)Fail– Otherwise

Marcos Kalinowski Experimental Software Engineering 4

Experimental Software Engineering

Page 5: Experimental Software Engineering

Experimental Software Engineering

• Text book– Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A.,

Experimentation in Software Engineering, Springer, 2012.

• Additional references– Kitchenham, B.A., Budgen, D., Brereton, P., Evidence-Based Software

Engineering and Systematic Reviews, Chapman and Hall/CRC, 2015.

– Kitchenham, B.A., Charters, S., Guidelines for performing systematic literature reviews in software engineering. Technical Report EBSE 2007–001, KeeleUniversity and Durham University Joint Report, 2007.

– Runeson, P., Höst, M., Rainer, A.W., Regnell, B., Case Study Research in Software Engineering – Guidelines and Examples. Wiley, 2012.

– Wieringa, R., Design Science Methodology for Information Systems and Software Engineering. Springer, 2014.

– Scientific Papers

Marcos Kalinowski 5Experimental Software Engineering

Page 6: Experimental Software Engineering

Experimental Software Engineering

• Important Dates– 30/04 – Deadline for delivering the secondary study plan

– 11/06 – Deadline for delivering the primary study plan

– 02/07 – Deadline for delivering the paper

• Others– 23/04 – Holiday

Marcos Kalinowski 6Experimental Software Engineering

Page 7: Experimental Software Engineering

INTRODUCTION

Marcos Kalinowski 7Experimental Software EngineeringMarcos Kalinowski Experimental Software Engineering

Page 8: Experimental Software Engineering

Introduction

• The story of the Denver International Airport ...

8Marcos Kalinowski Engenharia de Software Experimental

DEMARCO, T.; LISTER, T. (2003) Waltzing with Bears – Managing Risk on Software Projects. Dorset House. (ISBN: 978-0932633606).

“Software Engineering discipline remains years – perhaps decades– short of the mature engineeringdiscipline needed to meet thedemands of an information age society”.

Page 9: Experimental Software Engineering

Silver Bullets in Software Engineering?

9Marcos Kalinowski Experimental Software Engineering

Page 10: Experimental Software Engineering

Introduction

• Software development depends on differenttechnologies

– Usually there is no evidence available concerning:• Benefits

• Limitations

• Risks

10Marcos Kalinowski Experimental Software Engineering

Page 11: Experimental Software Engineering

Introduction

• During the projects, software engineers need toanswer questions like:

– Which software technology should I consider for myproject?

– How much training/investment is needed to introducethe technology into my process?

– When and how can I observe the return on investiment?

– Under which circumstances does the technology presentthe best performance?

11Marcos Kalinowski Experimental Software Engineering

Page 12: Experimental Software Engineering

• We need to have knowledge on our software technologies (methods, techniques and tools) to understand the situations in which theyreally work, their limits and how we can evolve them. (Basili, 1996)

Marcos Kalinowski Engenharia de Software Experimental 12

BASILI, V. R. (1996) The role of experimentation in software engineering: past, current, and future. IEEE International Conference on Software Engineering (ICSE), pp. 442-449.

Introduction

Page 13: Experimental Software Engineering

Obtaining Knowledge

• Building theories, models, experimentation andlearning

– Understanding a discipline involves building theories andmodels

– To verify if our understanding is correct, we need to:• Conduct experiments on our theories models

13Marcos Kalinowski Experimental Software Engineering

Page 14: Experimental Software Engineering

Obtaining Knowledge

• Building theories, models, experimentation andlearning

– Understanding a discipline involves building theories andmodels

– To verify if our understanding is correct, we need to:• Conduct experiments on our theories models

14Marcos Kalinowski Experimental Software Engineering

Experimentation isfundamental to both,

Academy and Industry!

Page 15: Experimental Software Engineering

Software Engineering

• Software Engineering involves development and isnot manufacturing

– Involves reasoning and human elements (e.g., develpers)

– There are several variables that can lead to differences inmeasurements

• Current Scenario:

– Limited amount of theories and models

– Lack of knowledge on the limits of existing technologiesfor certain development contexts

15Marcos Kalinowski Experimental Software Engineering

Page 16: Experimental Software Engineering

Experimental Software Engineering

• Experimental Studies

– Descovering something or testing hypotheses

– May involve different types of analysis: quantitativeand/or qualitative

• Studies may be:

16

Primary Secondary (Agregate results ofprimary studies)

Marcos Kalinowski Experimental Software Engineering

Page 17: Experimental Software Engineering

Experimental Software Engineering

• Experimental Studies

– Descovering something or testing hypotheses

– May involve different types of analysis: quantitativeand/or qualitative

• Studies may be:

17

Primary Secondary (Agregate results ofprimary studies)

Marcos Kalinowski Experimental Software Engineering

Measuring Variables

Page 18: Experimental Software Engineering

Experimental Software Engineering

• Experimental Studies

– Descovering something or testing hypotheses

– May involve different types of analysis: quantitativeand/or qualitative

• Studies may be:

18

Primary Secondary (Agregate results ofprimary studies)

Marcos Kalinowski Experimental Software Engineering

Understanding causes and effects of collected data

Page 19: Experimental Software Engineering

Classification of Experiments

In Virtuo

In Silico

In Vivo

In Vitro

No Model Needed

Environment needs to be modelled

Computational Models

of the Object and the Environment

Computational Models of the

Participant Behaviour,

Object and Environment

19Marcos Kalinowski Engenharia de Software Experimental

TRAVASSOS, G. H.; BARROS, M. O. (2003) Contributions of In Virtuo and In Silico Experiments for the Future of Empirical Studies in Software Engineering. In: 2nd Workshop on Empirical Software Engineering: The Future of Empirical Studies in Software Engineering, 2003, Rome.

Page 20: Experimental Software Engineering

Required Reading

• Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., Experimentation in Software Engineering, Springer, 2012.– Chapter 1 – Introduction

– Chapter 2 – Empirical Strategies

Marcos Kalinowski 20Experimental Software Engineering

Page 21: Experimental Software Engineering

PRIMARY STUDIES

21Marcos Kalinowski Experimental Software Engineering

Page 22: Experimental Software Engineering

Primary Study Types

• Controlled Experiment

– An experiment that allows controlling and manipulatingvariables.

• Case Study

– Investigates a phenomena in a real context. Typicallyconducted during software development or maintenanceprojects. Part of the behavior can not be manipulated.

22Marcos Kalinowski Experimental Software Engineering

Page 23: Experimental Software Engineering

• Survey

– Accomplished after a fact ocurred, aiming at identifyingsome evidence.

– Does not allow control.

• Action Research

– Research method that combines theory (research) andpractice (action), putting together researchers andpractitioners to solve a problem.

23Marcos Kalinowski Experimental Software Engineering

Primary Study Types

Page 24: Experimental Software Engineering

Controlled Experiment

Characteristics

– Investigate testable hypotheses

– Independent variables are manipulated to measure theireffects on dependent variables

24Marcos Kalinowski Experimental Software Engineering

Page 25: Experimental Software Engineering

Examples:

– Which technique is more effective for software inspection: checklist based reading or perspective basedreading?

25Marcos Kalinowski Experimental Software Engineering

Controlled Experiment

Page 26: Experimental Software Engineering

Observation

Cause Effect

Treatment Result

Theory

IndependentVariable

DependentVariable

ExperimentOperation

Marcos Kalinowski Engenharia de Software Experimental 26

WOHLIN, C., RUNESON, P., HÖST, M., OHLSSON, M., REGNELL, B., WESSLÉN, A. (2012) Experimentation in Software Engineering. Springer.

Controlled Experiment

Page 27: Experimental Software Engineering

Threats to Validity

• Results of experiments should be reported consideringtheir validity

– Internal

– External

– Construct

– Conclusion

Marcos Kalinowski Engenharia de Software Experimental 27

BIFFL, S.; KALINOWSKI, M.; EKAPUTRA, F.; ANDERLIN-NETO, A.; CONTE, T.; WINKLER, D. (2014) Towards a semantic knowledge base on threats to validity and control actions in controlled experiments. In: 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Torino, Italy.

Page 28: Experimental Software Engineering

Experiment Process

Scoping

Planning

Operation

Analysis

Marcos Kalinowski 28Experimental Software Engineering

Page 29: Experimental Software Engineering

• Scoping– Identification of the study goals– Identification of the objects and groups

• Planning– Formulation of hypotheses– Identification of dependent variables (response variables)– Identification of independent variables (factors)– Selection of subjects– Experiment design– Selection of the analysis methods– Instrumentation– Validity evaluation (threats to validity)

Marcos Kalinowski 29Experimental Software Engineering

Experiment Process

Page 30: Experimental Software Engineering

• Operation– Training and preparation– Execution of the study by the participants

• Analysis– Descriptive statistics– Graphical visualization– Elimination of outliers– Analysis of the distribution– Statistical hypothesis testing

• Packaging– Presentation of the results– Preparation of the package to repeat the study

Marcos Kalinowski 30Experimental Software Engineering

Experiment Process

Page 31: Experimental Software Engineering

Case Study• Definition:

“A method that investigates a phenomena within its real context, specially whenthe boundaries and/or the context of the phenomena are not well defined”

• Mainly used when the use of controlled experiments is not possible,because:– The context is important and difficult to be separated from the problem or to be

simulated– Several effects are expected and observing them might require a longer period of

time

31

RUNESON, P., HOST, M., RAINER, A., REGNELL, B. (2012) Case Study Research in Software Engineering: Guidelines and Examples. John Wiley & Sons.

Page 32: Experimental Software Engineering

Case Study

Types of Case Studies– Exploratory

• Used in initial investigations of phenomena

• Aim at deriving new ideas and hypotheses (formulatingtheories)

– Descriptive

• Describe a situation of phenomena

– Explanatory

• Search for na explanation for a situation or problem

• Mainly, but not mandatory, in the form of a causalrelationship

– Confirmatory

• Used to test/refute theories

32Marcos Kalinowski Experimental Software Engineering

RUNESON, P., HOST, M., RAINER, A., REGNELL, B. (2012) Case Study Research in Software Engineering: Guidelines and Examples. John Wiley & Sons.

Page 33: Experimental Software Engineering

Survey

• Retrospective (descriptive, explanatory, orexploratory) aiming at identifying characteristicsand/or opinions of a large population

• Representative sample selection for a certainpopulation plays a key role in survey research

– Data analysis techniques are used to generalize thesample to the population

33Marcos Kalinowski Experimental Software Engineering

Page 34: Experimental Software Engineering

Action-Research

• Characteristics:

– Researcher interferes on the study object with the purpose of improving it

• Goals:

– Promote improvements, and

– Contribute to scientific knowledge

34Marcos Kalinowski Experimental Software Engineering

SANTOS, P.S.M.; TRAVASSOS, G.H.; ZELKOWITZ, M.V. (2011) Action research can swing thebalance in experimental software engineering, Advances in Computers, vol. 83, 205-276.

Page 35: Experimental Software Engineering

Comparison of the Primary Studies

Marcos Kalinowski Engenharia de Software Experimental 35

WOHLIN, C., RUNESON, P., HÖST, M., OHLSSON, M., REGNELL, B., WESSLÉN, A. (2012) Experimentation in Software Engineering. Springer.

Page 36: Experimental Software Engineering

Exercises

Marcos Kalinowski Engenharia de Software Experimental 36

WOHLIN, C., RUNESON, P., HÖST, M., OHLSSON, M., REGNELL, B., WESSLÉN, A. (2012) Experimentation in Software Engineering. Springer.

• What is the difference between qualitative and quantitativeresearch?

• What is a survey? Give examples of different types of surveysin software engineering.

• Which role plays replication and systematic literaturereviews in building empirical knowledge?

• How can the Experience Factory be combined withGoal/Question/Metrics method and empirical studies on atechnology transfer context?

• Which are the key ethical principles to observe whenconducting experiments?

Page 37: Experimental Software Engineering

Required Reading

• Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., Experimentation in Software Engineering, Springer, 2012.– Chapter 3 – Measurement

• Optional reading:– Basili, V., Caldera, C., Rombach, D. Goal Question Metric Paradigm, Encyclopaedia of

Software Engineering (Marciniak J. editor), vol. 1, John Wiley & Sons, 1994, p. 528-532.

– Basili, V., Trendowicz, A., Kowalczyk, M., Heidrich, J., Seaman, C., Münch, J., Rombach, D. Aligning Organizations through Measurement - The GQM+Strategies Approach. Springer-Verlag, 2014.

– Fenton, N.E.; Bieman, J.; Software Metrics: A Rigorous and Practical Approach; 3rd edition, Kindle edition; Boca Raton, FL: CRC Press Taylor & Francis Group; 2015 ISBN 978-1-4398-3823-5

Marcos Kalinowski 37Experimental Software Engineering

Page 38: Experimental Software Engineering

MEASUREMENT

38Marcos Kalinowski Experimental Software Engineering

Page 39: Experimental Software Engineering

• Basic Concepts– Scale Types

– Objectives and Subjective Measures

– Direct or Indirect Measures

• Measurement in Software Engineering

• Measurement in Practice

• Exercises

Agenda

39Marcos Kalinowski Experimental Software Engineering

Page 40: Experimental Software Engineering

• “You can't control what you can't measure”

Tom DeMarco

• Measure x Measurement x Metric

• Measurement activities need clear goals

Basic Concepts

40Marcos Kalinowski Experimental Software Engineering

Page 41: Experimental Software Engineering

Measurement Goals

• Measurement activities need clear goals

– GQM: characterize, understand, evaluate, predict, improve?

• Goal/Question/Metric GQM (Basili and Rombach)

41Marcos Kalinowski Experimental Software Engineering

Page 42: Experimental Software Engineering

• Nominal– Least powerful scale, based on nominal classification

– Example: Defect Types

• Ordinal– Ranks entities after an ordering criterion

– Example: Software complexity levels, Likert scales

Scale Types

42Marcos Kalinowski Experimental Software Engineering

Page 43: Experimental Software Engineering

• Interval– Used when the distance between two measures is meaningful,

nut not the value itself

– Example: Temperatures Measured in Celsius or Fahrenheit

• Ratio– If there exists a meaningful zero value and the ratio between

two measures is meaningful, a ratio scale can be used

– Example: Effort invested in a development activity

Scale Types

43Marcos Kalinowski Experimental Software Engineering

Page 44: Experimental Software Engineering

• Objective Measures– There is no judgement in the measurement value and is

therefore only dependent on the object that is being measured

– Can be measured several times and will always rovide the samevalue, within the measurement error

– Examplo: Lines of Code

• Subjective Measures– The person making the measurement contributes by making

some sort of judgement

– Mostly of nominal or ordinal scale types

– Example: Usability

Objective and Subjective Measures

44Marcos Kalinowski Experimental Software Engineering

Page 45: Experimental Software Engineering

• Direct Measures– Gathered directly

– Example: Lines of Code

• Indirect Measures– Involve the measurement of other attributes

– Example: Defects/LOC, LOC/Hour

Direct or Indirect Measures

45Marcos Kalinowski Experimental Software Engineering

Page 46: Experimental Software Engineering

• Objects of Interest:

– Process• Ativities

– Product• Artefacts

– Resources• Human, Hardware and Software

Measurement in Software Engineering

46Marcos Kalinowski Experimental Software Engineering

Page 47: Experimental Software Engineering

• Internal Attributes– Obtained directly from the process, product or resource

– Example: Size of a software product

• External Attributes– Can only be measured with respect to how the object related with

other entities of its environment

– Example: Software reliability

Measurement in Software Engineering

47Marcos Kalinowski Experimental Software Engineering

Page 48: Experimental Software Engineering

• Measurement Approaches

– In software development processes• Métrics are defined by the SEPG and are then collected for each

software development project

• Goal Question Metrics Paradigm (GQM).

• Practical Software Measurement (PSM).

– In experimental studies• Metrics are defined by the researcher and then collected during

the study operation phase.

• Goal Question Metrics Paradigm (GQM).

Measurement in Practice

48Marcos Kalinowski Experimental Software Engineering

Page 49: Experimental Software Engineering

• Defines a way to plan and execute measurement andanalysis activities;

– Starts with the declaration of the measurement Goals;

– From the objectives Questions that we would like toanswer with the data interpretation are defined;

– Finally, from the questions, the Metrics and the data to becollected are defined.

• Example of a real GQM-based Measurement Plan

GQM

49Marcos Kalinowski Experimental Software Engineering

Page 50: Experimental Software Engineering

Marcos Kalinowski Experimental Software Engineering 50

Page 51: Experimental Software Engineering

Marcos Kalinowski Experimental Software Engineering 51

Page 52: Experimental Software Engineering

Marcos Kalinowski Experimental Software Engineering 52

Page 53: Experimental Software Engineering

Examples of Experimental Study Goals

• GQM Template:

“Analyze <object of study> with the purpose of <goal> with respect to <quality focus> from the point of view ofthe <perspective> in the context of <context>”.

53Marcos Kalinowski Experimental Software Engineering

Page 54: Experimental Software Engineering

Examples of Experimental Study Goals

CARNEIRO, G.; LAIGNER, R.; KALINOWSKI, M.; WINKLER, D.; AND BIFFL, S. Investigating the influence of

inspector learning styles on design inspections: Findings of a quasi-experiment. In CIbSE 2017 - XX Ibero-American

Conference on Software Engineering, pages 222-235, 2017.

Page 55: Experimental Software Engineering

Analyze the documentation debt related to the use of AR (user stories)

for the purpose of characterizing

with respect to the impacts that it can cause on the project in terms of extra effort and cost

from the viewpoint of the project manager

in the context of an industrial software development project.

Marcos Kalinowski Experimental Software Engineering 55

MENDES, T. S.; DE FREITAS FARIAS, M. A.; MENDONÇA, M. G.; SOARES, H. F.; KALINOWSKI, M.; AND

SPÍNOLA, R. O. Impacts of agile requirements documentation debt on software projects: a retrospective study. In

Proceedings ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016, pages 1290-1295, 2016.

Examples of Experimental Study Goals

Page 56: Experimental Software Engineering

Examples of Experimental Study Goals

ESTÁCIO, B., OLIVEIRA, R., MARCZAK, S., KALINOWSKI, M., GARCIA, A., PRIKLADNICKI, R., LUCENA, C.

Evaluating Collaborative Practices in Acquiring Programming Skills: Findings of a Controlled Experiment. In:

Simpósio Brasileiro de Engenharia de Software (SBES), Belo Horizonte, Brazil, 2015.

Page 57: Experimental Software Engineering

Exercises

• What are measure, measurement and meatric and howthey relate?

• Which are the four main measurement scale types?

• What is the difference between a direct and na indirectmeasure?

• Which three classes are measurements in software engineering divided into?

• What are internal and external attributes and how are they mostly related to direct and indirect measures?

Marcos Kalinowski Experimental Software Engineering 57

Page 58: Experimental Software Engineering

Required Reading

• Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., Experimentation in Software Engineering, Springer, 2012.– Chapter 4 – Systematic Literature Reviews

• Kitchenham, B., Charters, S. Guidelines for performing systematic literaturereviews in software engineering. Technical Report, Keele University andUniversity of Durham, 2007.

• Petersen, K., Vakkalanka, S., Kuzniarz, L., Guidelines for conducting systematicmapping studies in software engineering: An update. Information & Software Technology 64: 1-18, 2015.

• Optional Reading (examples):– E. Mendes, M. Kalinowski, D. Martins, F. Ferrucci and F. Sarro, Cross- vs. Within-Company

Cost Estimation Studies Revisited: An Extended Systematic Review, In: Proc. International Conference on Evaluation and Assessment in Software Engineering (EASE), London, UK, 2014.

– Alves N. S. R., Mendes, T. S., Mendonca, M. G., Spínola R.O., Shull, F., Seaman, C.B.:Identification and management of technical debt: A systematic mappingstudy. Information & Software Technology 70: 100-121 (2016).

Marcos Kalinowski 58Experimental Software Engineering

Page 59: Experimental Software Engineering

SECONDARY STUDIES

59Marcos Kalinowski Experimental Software Engineering

Page 60: Experimental Software Engineering

Knowledge Acquisition in Software EngineeringStudies

The experimentation process has a recursive nature

Knowledge acquired in primary studies feed secondary studies, which enableidentifying the need of new primary studies...

Marcos Kalinowski Engenharia de Software Experimental 60

TRAVASSOS, G. H.; SANTOS, P. S. M.; MIAN, P.; DIAS NETO, A. C.; BIOLCHINI, J. (2008). An Environment to Support Large Scale Experimentation in Software Engineering. In: Proc. of XIII IEEE International Conference on Engineering of Complex Computer Systems, Belfast.

Page 61: Experimental Software Engineering

Secondary Studies

• Secondary studies are studies that review primarystudies concerning a specific research question withthe goal of providing a research synthesis of theexisting evidence.

– Aim at identifying, evaluating and interpreting all relevantresults on a given research topic.

– Examples: systematic reviews.

61Marcos Kalinowski Experimental Software Engineering

Page 62: Experimental Software Engineering

Systematic Literature Reviews (SLRs)

• Literature review that aims at being:

– ...fair (not biased)

– ...rigorous (defined process)

– ...open (transparent)

– ...objective (reproducible)

• Used in many research areas

– Social sciences, health and education

– Very common in medicine

62Marcos Kalinowski Engenharia de Software Experimental

KITCHENHAM, B.; CHARTERS, S. (2007) Guidelines for performing Systematic Literature Review in Software

Engineering. Keele University Technical Report - EBSE-2007-01.

Page 63: Experimental Software Engineering

Reasons for Conducting Reviews

• Academy:– Experimental characterization of different technologies.

– Repetition of studies in different contexts to acquireknowledge incrementally.

• Industry:– Experimental results may indicate the impact of using

technologies in different contexts.

– Decision support.

63Marcos Kalinowski Experimental Software Engineering

Page 64: Experimental Software Engineering

Advantages of Conducting SLRsCharacteristic Traditional Review Systematic Review

Question Usually broadly scoped Focused on researchquestions

Identification ofresearch

Not specified, potentially biased

Several sources and welldefined search strategy

Selection Not specified, potentially biased

Selection based on explicitcriteria

Evaluation Variable Rigorous assessment

Sinthesis Frequently a qualitativesummary

Qualitative andquantitative

Inferences Sometimes based onevidence

Usually based on evidence

64Marcos Kalinowski Experimental Software Engineering

Page 65: Experimental Software Engineering

SLR

First Filter

Surveys

Case Studies

Experiments

PrimaryStudies

Second Filter

Surveys

Case Studies

Experiments

Extracted Data

65Marcos Kalinowski Experimental Software Engineering

Page 66: Experimental Software Engineering

Systematic Mapping Study (SMS)

• Secondary study approach

• Rigorous review, that uses a formal process to:

– Identify all relevant research on a specific topic

– SMSs are conducted to identify and categorize existing studies

• Provide only na overview on the research topic

• There is no comparison of results of methods or techniques

66Marcos Kalinowski Engenharia de Software Experimental

PETERSON, K., FELDT, R., MUJTABA, S., MATTSON, M. (2008) Systematic Mapping Studies in Software Engineering.

In: 12th international conference on Evaluation and Assessment in Software Engineering.

Page 67: Experimental Software Engineering

Discussion of the Papers: Best Practicesand Examples

Marcos Kalinowski 67Experimental Software Engineering

Page 68: Experimental Software Engineering

Required Reading

• Kuhrmann, M., Fernández, D.M. and Daneva, M., 2017. On the pragmatic design of literature studies in software engineering: an experience-based guideline. Empirical software engineering, 22(6), pp.2852-2891.

• Cruzes, D.S. and Dybå, T., 2011. Research synthesis in software engineering: A tertiary study. Information and Software Technology, 53(5), pp.440-455.

Marcos Kalinowski 68Experimental Software Engineering

Page 69: Experimental Software Engineering

EXPERIMENT PROCESS, SCOPING AND PLANNING

69Marcos Kalinowski Experimental Software Engineering

Page 70: Experimental Software Engineering

Required Reading

• Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., Experimentation in Software Engineering, Springer, 2012.– Chapters 6 (Experiment Process), 7 (scoping), and 8 (planning).

• Optional Reading (examples of experiments):– ESTÁCIO, B., OLIVEIRA, R., MARCZAK, S., KALINOWSKI, M., GARCIA, A.,

PRIKLADNICKI, R., LUCENA, C. Evaluating Collaborative Practices in Acquiring Programming Skills: Findings of a Controlled Experiment. In: Simpósio Brasileiro de Engenharia de Software (SBES), 2015, Belo Horizonte.

– RIVERO, L., KALINOWSKI, M., CONTE, T. Practical Findings from Applying Innovative Design Usability Evaluation Technologies for Mockups of Web Applications. In: 47th Hawaii International Conference on System Sciences (HICSS), 2014.

Marcos Kalinowski 70Experimental Software Engineering

Page 71: Experimental Software Engineering

• Experimentation Process

• Experiment Scoping

• Experiment Planning– Context Selection

– Hypotheses Formulation

– Variable Selection

– Participant Selection

– Experiment Design

– Instrumentation

– Threats to Validity

Agenda

71Marcos Kalinowski Experimental Software Engineering

Page 72: Experimental Software Engineering

Experimentation Process

Scoping

Planning

Execution

Analysis

72Marcos Kalinowski Experimental Software Engineering

Page 73: Experimental Software Engineering

• Scoping– Identification of the study goals– Identification of the objects and groups

• Planning– Formulation of hypotheses– Identification of dependent variables (response variables)– Identification of independent variables (factors)– Selection of subjects– Experiment design– Selection of the analysis methods– Instrumentation– Validity evaluation (threats to validity)

Marcos Kalinowski 73Experimental Software Engineering

Experiment Process

Page 74: Experimental Software Engineering

• Operation– Training and preparation– Execution of the study by the participants

• Analysis– Descriptive statistics– Graphical visualization– Elimination of outliers– Analysis of the distribution– Statistical hypothesis testing

• Packaging– Presentation of the results– Preparation of the package to repeat the study

Marcos Kalinowski 74Experimental Software Engineering

Experiment Process

Page 75: Experimental Software Engineering

Experiment Scoping

• Identify the Goal and the Context of the Study

GQM template:

“Analyze <Object(s) of study> for the purpose of <Purpose> with respect to their <Quality focus> from the point of view of the <Perspective> in the context of <Context>”.

• Identify the objects and study groups (control and experimental group)

Marcos Kalinowski 75Experimental Software Engineering

Page 76: Experimental Software Engineering

• Experimentation Process

• Experiment Scoping

• Experiment Planning– Context Selection

– Hypotheses Formulation

– Variable Selection

– Participant Selection

– Experiment Design

– Instrumentation

– Threats to Validity

Agenda

Marcos Kalinowski 76Experimental Software Engineering

Page 77: Experimental Software Engineering

Experiment Planning

Marcos Kalinowski 77Experimental Software Engineering

Page 78: Experimental Software Engineering

Context Selection

• Four dimensions:

– Off-line vs on-line;

– Students vs professionals;

– Toy vs real problems;

– Specific vs general.

Marcos Kalinowski 78Experimental Software Engineering

Page 79: Experimental Software Engineering

Hypothesis Formulation

• Null Hypothesis;

• Alternative Hypotheses.

Marcos Kalinowski 79Experimental Software Engineering

Page 80: Experimental Software Engineering

Variable Selection

• Dependent Variables (Response Variables);

• Independent Variables (including Factors).

Marcos Kalinowski 80Experimental Software Engineering

Page 81: Experimental Software Engineering

Participant Selection

• Sample selection.

– Selecting subjects by random is not always possible

Marcos Kalinowski 81Experimental Software Engineering

Page 82: Experimental Software Engineering

Experiment Design

• Principles:

– Randomization;

– Blocking;

– Balancing;

• Design Types:

– Number of factors;

– Number of treatments.

Marcos Kalinowski 82Experimental Software Engineering

Page 83: Experimental Software Engineering

Instrumentation

• Instruments should be completely developed before conducting the experiment and ideally evaluated through a pilot study.

• Examples: Agreement to partipate, subject characterization form, study objects, task description, measurement instruments, follow-up questionnaire.

Marcos Kalinowski 83Experimental Software Engineering

Page 84: Experimental Software Engineering

Threats to Validity

• Conclusion Validity;

• Internal Validity;

• Construct Validity;

• External Validity.

Marcos Kalinowski 84Experimental Software Engineering

Page 85: Experimental Software Engineering

Observation

Cause Effect

Treatment Result

Theory

IndependentVariable

DependentVariable

ExperimentOperation

Marcos Kalinowski Engenharia de Software Experimental 85

WOHLIN, C., RUNESON, P., HÖST, M., OHLSSON, M., REGNELL, B., WESSLÉN, A. (2012) Experimentation in Software Engineering. Springer.

Controlled Experiment

Page 86: Experimental Software Engineering

Threats to Validity

• Results of experiments should be reported consideringtheir validity

– Internal

– External

– Construct

– Conclusion

Marcos Kalinowski Engenharia de Software Experimental 86

BIFFL, S.; KALINOWSKI, M.; EKAPUTRA, F.; ANDERLIN-NETO, A.; CONTE, T.; WINKLER, D. (2014) Towards a semantic knowledge base on threats to validity and control actions in controlled experiments. In: 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Torino, Italy.

Page 87: Experimental Software Engineering

Exercises

Marcos Kalinowski Engenharia de Software Experimental 87

WOHLIN, C., RUNESON, P., HÖST, M., OHLSSON, M., REGNELL, B., WESSLÉN, A. (2012) Experimentation in Software Engineering. Springer.

• What are a null hypothesis and an alternativehypothesis?

• What is type-I-error and type-II-error respectively,which is worst and why?

• In which different ways may subjects be sampled?

• What different types of experiment designs are available, and how do the design relate to the statistical methods to apply in the analysis?

• What are the types of threats to validity? Provide one example threat for each type.

Page 88: Experimental Software Engineering

EXPERIMENT DESIGN: ADVANCED CONCEPTS

Marcos Kalinowski 88Experimental Software Engineering

Page 89: Experimental Software Engineering

Required Reading

• ESTÁCIO, B., OLIVEIRA, R., MARCZAK, S., KALINOWSKI, M., GARCIA, A., PRIKLADNICKI, R., LUCENA, C. Evaluating Collaborative Practices in Acquiring Programming Skills: Findings of a Controlled Experiment. In: Simpósio Brasileiro de Engenharia de Software (SBES), 2015, Belo Horizonte, Brazil.

• RIVERO, L., KALINOWSKI, M., CONTE, T. Practical Findings from Applying Innovative Design Usability Evaluation Technologies for Mockups of Web Applications. In: 47th Hawaii International Conference on System Sciences (HICSS), 2014.

Marcos Kalinowski 89Experimental Software Engineering

Page 90: Experimental Software Engineering

ADVANCED CONCEPTS(Let's plan and manage more complex studies)

Based on material gently provided by Prof. Guilherme Horta Travassos

Marcos Kalinowski 90Experimental Software Engineering

Page 91: Experimental Software Engineering

Principles of Experimental Designs

• Simple designs help to make the experiment practical– minimizing use of time, money, personnel and experimental

resources

– easier to analyze

• Maximizing information yields more complete understanding– allows generalization to the widest possible situations

• Consider several issues to simplify and maximize:– experimental error

– replication

– randomization

– local control

Marcos Kalinowski 91Experimental Software Engineering

Page 92: Experimental Software Engineering

Factors and Experimental Design

• A factor is an independent variable in the design.

– Examples: To determine the effects of experience and language on productivity, design may have two independent variables: experience and language. Dependent variable is productivity.

• Values or classifications for each factor are called levelsof the factor.

• Levels can be continuous or discrete, quantitative or qualitative.

– Example: Number of years of experience

Marcos Kalinowski 92Experimental Software Engineering

Page 93: Experimental Software Engineering

Experimental Error

• Experimental error describes the failure of two identically treated experimental units to yield identical results– reflects errors of experimentation

– reflects errors of observation

– reflects errors of measurement

– reflects the variation in experimental resources

– reflects the combined effects of confounding factors that can influence the characteristics under study but which have not been singled out for attention in the investigation

• Example: Error may be due to– mind wandering

– timer measured elapsed time inexactly

– distractions: loud noises in next room

– …

Marcos Kalinowski 93Experimental Software Engineering

Page 94: Experimental Software Engineering

How to Control Error

• Control as many variables as possible

• Minimize variability among participants

• Minimize effects of irrelevant variables

• Try to use design to distribute effects of irrelevant variables equally across all experimental conditions

• Techniques for controlling error in the design– Replication

– Randomization

– Local control

Marcos Kalinowski 94Experimental Software Engineering

Page 95: Experimental Software Engineering

Replication

• Represents the repetition of the basic experiment

• It means repeating an experiment under identical conditions, rather than repeating measurements on the same unit

• It provides an estimate of experimental error that acts as a basis for assessing the importance of observed differences in an independent variable (that is, how much confidence we can have in the results)

• It enables us to estimate the mean effect of any experimental factor

Marcos Kalinowski 95Experimental Software Engineering

Page 96: Experimental Software Engineering

Confounding

• Two or more variables are confounded if it is impossible to separate their effects when the subsequent analysis is performed.– Example: you are comparing the use of a new tool with your existing

tool. Programmer A uses the new tool in your development environment, while B uses the existing tool. If you compare measures of quality in the resulting code, the difference is due to the tools only if you have accounted for differences in skill of the programmers. That is, the effects of tools and programmer skill are confounded.

• Confounding is introduced when there is no control for other variables.

• Sequence can also confound (learning effect): Test team uses technique X to test, then technique Y.

Marcos Kalinowski 96Experimental Software Engineering

Page 97: Experimental Software Engineering

Randomization

• Replication allows us to know the statistical significance of the results, but not the validity. That is, we want to be sure that the results followed from the treatments applied. For this, we distribute the observations independently.

• Randomization is the random assignment of subjects to groups or of treatments to experimental units, so that we can assume independence and thus validity of results.

• Randomization does not guarantee independence but keeps variation of bias to a minimum.

Marcos Kalinowski 97Experimental Software Engineering

Page 98: Experimental Software Engineering

Local Control

• Reflects how much control you have over the placement of subjects in experimental units and the organization of those units.

• Makes the design more efficient by reducing the magnitude of experimental error.

• Two aspects of local control:

– blocking

– balancing the units

Marcos Kalinowski 98Experimental Software Engineering

Page 99: Experimental Software Engineering

Blocking

• allocates experimental units to blocks or groups so that the units within a block are relatively homogenous

• predictable variation among units is confounded with the effects of the blocks

• Example: investigating the effects of three design techniques on code quality.– Teach techniques to 12 developers, measure number of defects per

thousand lines of code– If the 12 graduated from 3 universities, training at each university may

affect the way the design technique is understood or used– To eliminate the effects of this, define three blocks: first has all

developers from university X, second from university Y, third from university Z

– Then assign treatments randomly to the developers within each block

Marcos Kalinowski 99Experimental Software Engineering

Page 100: Experimental Software Engineering

Balancing

• blocking and assigning treatments so that an equal number of subjects is assigned to each treatment, whenever possible

• simplifies statistical analysis

• designs can range from being completely balanced to little or no balance

• If a design has no blocks, it must be completely randomized.

Marcos Kalinowski 100Experimental Software Engineering

Page 101: Experimental Software Engineering

Types of Experimental Designs

• Type of design can constrain the analysis.– For example, the way to perform an analysis of variance depends on

number of variables and the way in which subjects are grouped and balanced.

• Measurement scale can constrain the analysis.– Nominal scales divide data into categories, while ordinal scales

permit rank ordering and more powerful tests. Parametric tests such as analysis of variance require at least interval scale.

• Sampling can constrain the analysis.– Degree of randomization

– Distribution of data• Normal or near-normal and homoscedastic distributions can use parametric

tests; otherwise, non-parametric tests are preferable.

Marcos Kalinowski 101Experimental Software Engineering

Page 102: Experimental Software Engineering

EXPERIMENT ANALYSIS

AND INTERPRETATION

Marcos Kalinowski 102Experimental Software Engineering

Page 103: Experimental Software Engineering

Required Reading

• Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., Experimentation in Software Engineering, Springer, 2012.– Chapters 9 (Operation) and 10 (Analysis and Interpretation)

• ESTÁCIO, B., OLIVEIRA, R., MARCZAK, S., KALINOWSKI, M., GARCIA, A., PRIKLADNICKI, R., LUCENA, C. Evaluating Collaborative Practices in Acquiring Programming Skills: Findings of a Controlled Experiment. In: Simpósio Brasileiro de Engenharia de Software (SBES), 2015, Belo Horizonte, Brazil.

• RIVERO, L., KALINOWSKI, M., CONTE, T. Practical Findings from Applying Innovative Design Usability Evaluation Technologies for Mockups of Web Applications. In: 47th Hawaii International Conference on System Sciences (HICSS), 2014.

Marcos Kalinowski 103Experimental Software Engineering

Page 104: Experimental Software Engineering

ADVANCED CONCEPTS(Let's talk about statistics and data analysis)

Based on material gently provided by Prof. Guilherme Horta Travassos

Marcos Kalinowski 104Experimental Software Engineering

Page 105: Experimental Software Engineering

Experimentation Process

Definition

Planning

Execution

Analysis

Statistical Inference Techniques

Marcos Kalinowski 105Experimental Software Engineering

Page 106: Experimental Software Engineering

Hypotheses, Variables and Scales

• Planning and Hypotheses

• Hypotheses

• Choosing the variables

• Scales

• Scales’ information level

• Scales and basic operations

Marcos Kalinowski 106Experimental Software Engineering

Page 107: Experimental Software Engineering

Planning and Hypotheses

• Planning

– Hypotheses Formulation

– Dependent variables identification (responses)

– Independent variables Identification (factors)

– Participants Selections

– Study Design

– Selection of Analysis Methods

– Instruments Definition

– Threats to validity (experiment risks)

Marcos Kalinowski 107Experimental Software Engineering

Page 108: Experimental Software Engineering

Hypothesis

• A Hypothesis is a theory or supposition that can explain a determined behavior of the research interest

• An experimental study aims at collecting data, from a controlled environment, to support the hypothesis confirmation or refuting

“Developers using the technique Y can conclude the task of requirements analysis in less time and produce a more complete requirements set than when using the technique X”

Marcos Kalinowski 108Experimental Software Engineering

Page 109: Experimental Software Engineering

Hypotheses and Variables

• Hypotheses guide the definition of variables

• Independent Variables (become factors when controlled) – Relate to process inputs. Can be controlled.

– Represent the causes that are expected to affect the results. When controlled their values are called treatments.

• Dependent Variables– Relate to process outputs and they are affected throughout the

experimentation process.

– Represent the effect from the combination of the independent variables values (including the factors). Their possible values are called results.

Marcos Kalinowski 109Experimental Software Engineering

Page 110: Experimental Software Engineering

Hypotheses and Variables

“Developers using the technique Y can conclude the task of requirements analysis in less time and produce a more complete requirements set than when using the technique X”

Independent Variables

Used technique (treatments: Y e X)

Developers Characterization

Application Characterization

Dependent Variables

Time to execute the task

% of right requirements defined

Marcos Kalinowski 110Experimental Software Engineering

Page 111: Experimental Software Engineering

Variables and their values

• Studies` variables can be:

– Qualitative: the values (treatments) represent types

– Quantitative: the values represent levels for the variable application

• The values of the variables are collected in scales:

– There are different scales that can be used to collect and represent these values: nominal, ordinal, interval and ratio.

– The scales specify the operations that can be applied to the variables values

Marcos Kalinowski 111Experimental Software Engineering

Page 112: Experimental Software Engineering

Nominal Scale

• Nominal scale values represent different types of an element, without numerical interpretation nor ordering among them.

• Examples in software include:– Names of different measures of software size (lines of code,

function points, use case points, ...)– Names of different programming languages (Java, C++, C#, Pascal,

...)

• The scale does not allow us to say, for instance, that lines of code is greater than function points nor that Java is less than C#

Marcos Kalinowski 112Experimental Software Engineering

Page 113: Experimental Software Engineering

Ordinal Scale

• Ordinal Scale values represent different element types that can be ordered with no numerical interpretation

• Exemples in software include:

– Different CMMI levels (1, ..., 5) or MPS.BR (G, ..., A)

• The scale allows to say, for instance, that CMMI 2 is less than CMMI3, but does not allow to say that the quality difference between the companies CMMI 2 and CMMI 3 is the same as CMMI 3 and CMMI 4.

Marcos Kalinowski 113Experimental Software Engineering

Page 114: Experimental Software Engineering

Interval Scale

• Interval scale values can be ordered and the distance between consecutive values can be interpreted equally, however the ratio between these values has no meaning.

• For instance: although we can say that 2011 represents an year after 2010 and an year before 2012 there is no meaning in calculating the ratio between 2011 and 2012.

• The comparison is possible just because all interval scale presents an arbitrary zero point (in the case of dates, the year 0)

Marcos Kalinowski 114Experimental Software Engineering

Page 115: Experimental Software Engineering

Interval Scale

• The Likert Scales represent and example of intervalscale pretty used in software related studies

– Using a Likert scale we can define different names torepresent, in general, the intensity of a property that can notbe directly measured.

– For instance, we can build a Likert scale to evaluate the riskimpact using the following values: very high, high, medium,low and very low.

– Although impossible to verify the interval distance in the realworld, it is assumed these values are very near each other.

Marcos Kalinowski 115Experimental Software Engineering

Page 116: Experimental Software Engineering

Ratio Scale

• Ratio scale values can be ordered, the distance between consecutive values have the same meaning and the ratio between values can be interpreted.

• Examples in software include software size, effort and time for the project execution.

• The ratio scale allows to say, for instance, that a software with X lines of code is twice smaller than a software with 2X lines of code

• In ratio scale, 0 (zero) means no existence of the measure.

Marcos Kalinowski 116Experimental Software Engineering

Page 117: Experimental Software Engineering

Scales Information

Nominal

Ordinal

Interval

Ratio

Values can be counted and ordered

Values can counted and ordered

Distance between values can be interpreted

Values can counted and ordered

Distance between values can be interpreted

Ratio between values can be interpreted

Mo

re I

nfo

rm

ati

on

...

Values can be counted

Marcos Kalinowski 117Experimental Software Engineering

Page 118: Experimental Software Engineering

Scales and Characteristics

Scale Nominal Ordinal Interval Ratio

Values Counting X X X X

Values Ordering X X X

Equidistant Intervals

X X

Adding and Subracting values

X

Values Division X

• According to the variable scales, we can explore different characteristis of their values

Marcos Kalinowski 118Experimental Software Engineering

Page 119: Experimental Software Engineering

Example

“Developers using the technique Y can conclude the task of requirements analysis in less time and produce a more complete requirements set than when using the technique X”

Independent Variables

Used technique (treatments: Y e X)

Nominal Scale with 2 treatments

Developers and Application Characterization

Nominal or Ordinal Scale

Dependent Variables

Time to execute the task

Ratio Scale

% of right requirements defined

Ratio Scale

Marcos Kalinowski 119Experimental Software Engineering

Page 120: Experimental Software Engineering

Tabulation and Graphics

• Variables and execution

• Tabulation

• Graphical Analysis

• Histograms

• Pie Charts

• Dispersion Charts

• Control Charts

Marcos Kalinowski 120Experimental Software Engineering

Page 121: Experimental Software Engineering

Variables and Execution

• The execution of an experimental study consists in a series of trials– In each trial, a participant applies one treatment from the

independent variables set and produces results for each dependent variable

– These results are collected in tuples of type Ai = {Ti, Ri}, where Ti is the ordered set regarding each treatment of each independent variable applied by the participant i and Ri represents the ordered set of each result obtained by the same participant for each dependent variable

– These results are going to be the reason for data analysis in the experimental study.

Marcos Kalinowski 121Experimental Software Engineering

Page 122: Experimental Software Engineering

Variables and Execution• Some tabulated data after the execution of a hypothetical study. These

data will be used in the next slides.

Participant Technique Time(days) % Right Found

1 Y 10 83%2 Y 13 73%3 Y 12 87%4 Y 13 78%5 Y 10 74%6 Y 14 74%7 Y 14 87%8 Y 13 75%9 Y 14 86%

10 Y 14 82%11 Y 13 77%12 X 13 90%13 X 9 89%14 X 11 88%15 X 14 87%16 X 9 97%17 X 12 81%18 X 9 82%19 X 12 86%20 X 11 92%21 X 14 96%22 X 13 98%Marcos Kalinowski 122Experimental Software Engineering

Page 123: Experimental Software Engineering

Variables and Execution

• After data tabulation, central tendency measurements, dispersion and dependency can be used together with graphical analysis to better “understand” the data.

• This understanding is important when selecting and applying the statistical inference techniques, that will support the hypothesis testing.

Marcos Kalinowski 123Experimental Software Engineering

Page 124: Experimental Software Engineering

Graphical Visualization

• A chart visually represents the tabulated information

– Charts are usually easier to understand when compared to large tabulated data sets

– The spatial data presentation helps in the identification of groups and the visualization of relationships among them

– In general, charts can be quickly read

• Methods for graphical representation of data

– Histograms

– Pie Charts

– Dispersion charts

Marcos Kalinowski 124Experimental Software Engineering

Page 125: Experimental Software Engineering

Graphical Visualization

• The graphical visualization methods can depend on the variables classification (continuous, discrete)

• Discrete variables can assume any value into a defined finite set of values– They are more common in nominal or ordinal scales. However, they

can also occur in the interval and ratio scales

• Continuous variables can assume any value in an interval with an infinite set of values– They are common in the interval and ratio scales.

Marcos Kalinowski 125Experimental Software Engineering

Page 126: Experimental Software Engineering

Histograms

• It shows the observed values regarding one specific variable in the frequency domain

– The frequency indicates the number or percentage of occurrences for each value from the collected values set

– If data is discrete, each information is presented as a bar as high as the number of times that the value occurs in the value set

– If data is continuous, they shall be made discrete, it means, data needs to be split in equidistant regions. After, it is needed to count how many times the values of each region show up in the collected values set. Next, a bar can be traced as for discrete data.

Marcos Kalinowski 126Experimental Software Engineering

Page 127: Experimental Software Engineering

Histograms

• It is a common representation method for numerical data in any scale, because it involves only counting.

• The histograms also allow to relate observed data with known frequency distributions

– These distributions have mathematical properties from which the statistic inference tests are derived

• If the observed data do not follow these properties (normality, for instance), we can not be confident in the results of the

testing. In these cases, other types of statistical tests must be used

Marcos Kalinowski 127Experimental Software Engineering

Page 128: Experimental Software Engineering

Histograms• Histogram of time spent by the participants in the analysis activity,

according to the used technique

Time

(days)

TechniqueY

TechniqueX

9 0 3

10 2 0

11 0 2

12 1 2

13 4 2

14 4 2

* Data Distribution Table

Time 9days)

# p

artic

ipan

ts

Marcos Kalinowski 128Experimental Software Engineering

Page 129: Experimental Software Engineering

Cumulative Histogram

• A cumulative histogram shows the frequency of occurrence of values less than or equal to a specific value. – Each bar in the graph represents the sum of the previous

bars into a conventional histogram

– In different configurations, it is possible to get some suggestion about the acceptance or rejection of the hypothesis by observing the cumulative histogram regarding ( however, just the statics testing can confirm it!)

– Because data must be ordered, cumulative histograms can not be used with nominal scale variables values.

Marcos Kalinowski 129Experimental Software Engineering

Page 130: Experimental Software Engineering

• Cumulative Histogram for time spent by the participants in the analisys activities with techniques X and Y

0

1

2

3

4

5

6

7

8

9

10

11

12

9 10 11 12 13 14

Técnica Y Técnica X

Time

(days)

TechiniqueY

TechiniqueX

9 0 3

10 2 3

11 2 5

12 3 7

13 7 9

14 11 11

Time (days)#

partic

ipan

ts* Data Distribution Table

Cumulative Histogram

Marcos Kalinowski 130Experimental Software Engineering

Page 131: Experimental Software Engineering

Pie Chart

• A pie (pizza) chart shows the relative frequency (or percentage) of data occurrence, dividing the data by a set of distinct classes and presenting them as proportional slices in the circle.

928%

1118%12

18%

1318%

1418%

Técnica XX Dias% participantes

X Days% participants

Marcos Kalinowski 131Experimental Software Engineering

Page 132: Experimental Software Engineering

Dispersion Diagram

• It shows the observed values of two or more variables through Cartesian graphics.

– Each axis represents one of the variables, composing tuples (two or more dimensions)

– This representation format helps in the identification of patterns that can suggest relations between variables

– Dispersion Diagrams also help to identify the values that are different from normal behavior (outliers). Outliers can distort statistical analysis and shall usually be eliminated before statistical tests.

Marcos Kalinowski 132Experimental Software Engineering

Page 133: Experimental Software Engineering

• Dispersion between the percentage of right requirements found and execution time for activities with techniques X and Y

60%

65%

70%

75%

80%

85%

90%

95%

100%

8 10 12 14 16

% r

igh

t re

qu

irem

en

ts f

ou

nd

Time

Y

X

Dispersion Diagram

Marcos Kalinowski 133Experimental Software Engineering

Page 134: Experimental Software Engineering

Control Charts

• Statistical tool allowing the observation of quantitative data behavior representing the characteristics under investigation

• A typical control graph presents 3 parallel lines:

– A central line, representing the mean behavior presented by the data

– A high extreme limit, called UCL – Upper Control Limit

– A low extreme limit, called LCL – Lower Control Limit)

Marcos Kalinowski 134Experimental Software Engineering

Page 135: Experimental Software Engineering

Control Charts

Versões

21191715131197531

Núm

ero

de D

efei

tos

70

60

50

40

30

20

10

0

Num Defeitos

UCL = 26,81

Média = 15,14

LCL = 3,46

• If the characteristic behavior is under control, its values will bounce around the center line (for instance, the number of mean defects by software version), within the UCL and LCL ranges.

• Once the characteristic behavior is under control, the probability of getting a value out of limits is very low.

Marcos Kalinowski 135Experimental Software Engineering

Page 136: Experimental Software Engineering

Descriptive Statistics

• Objectives

• Central Tendency Measures

• Dispersion Measures

• Frequency Distribution

• Example

• Dependency Measurements

Marcos Kalinowski 136Experimental Software Engineering

Page 137: Experimental Software Engineering

Objectives

• To describe the characteristics behavior and trends from the experimental study collected data through statistics methods

– Together with the graphical analysis, allows the initial analysis of data and measuring of dependencies and relationships among data.

• It aims at to give a general view about the general distribution of the data set.

Marcos Kalinowski 137Experimental Software Engineering

Page 138: Experimental Software Engineering

Central Tendency Measures

• Show the middle values of the observed data set– Mean (arithmetic): meaningful for the interval and ratio scales

– Median: represents the middle value of an ordered data set, following that the number of samples higher than the median is the same as the number of samples lower than the median• Odd samples: median is represented by the middle sample

• Even samples: median is represented by the mean ot the two middle samples

– Mode: represents the most commonly occurring sample. It is meaningful for the nominal, ordinal, interval and ratio scales.• Well defined when just one value has the highest count

• Odd number of samples: it can be considered the middle value of the most common samples with same occurrence (not valid for nominal scale)

Marcos Kalinowski 138Experimental Software Engineering

Page 139: Experimental Software Engineering

Central Tendency Measures

• Other relevant measures

– Minimum Value: represents the lower observed value into the collected data set

– Maximum Value: represents the higher observed value into the collected data set

– Percentile: considering a sample with 100 values, the percentile X% represents the value that split the sample in X values lower than it and (100-X) values greater than it. The median is a special case of the percentile, namely the 50%-percentile

– Quartile: values representing the 25% percentile (1st Quartile), the median (2nd Quartile) and 75% percentile (3rd Quartile).

Marcos Kalinowski 139Experimental Software Engineering

Page 140: Experimental Software Engineering

Dispersion Measures• Measure the level of variation from the central tendency, i.e.

to see how outspread or concentrated the data is– Range: represents the distance between the maximum and minimum

data values– Variance: the mean of the square distance from the sample mean. It

is meaningful for the interval and ratio scales.– Standard deviation: it is the square root of the variance having the

same dimension (unit of measure) as the data values themselves.

freq

uen

cy

xx

freq

uen

cy

Marcos Kalinowski 140Experimental Software Engineering

Page 141: Experimental Software Engineering

Descriptive Statistics

0123456789

1011

12131415

1 2 3 4 5 6 7 8 9 10 11

Atividade Y (tempo em dias)

Bar Chart representing the time consumed by each participant that applied technique Y in the analysis activity

Marcos Kalinowski 141Experimental Software Engineering

Page 142: Experimental Software Engineering

Descriptive Statistics

Measures of Tendency

Mean 12,73

Median 13

Modes 13 e 14

Range 4

Minimum 10

Maximum 14

1st Quartile 12,5

3rd Quartile 14

Variance 2,22

Standard Deviation

1,49

Técnica Y (Histograma do Tempo)

0

1

2

3

4

5

9 10 11 12 13 14TEMPO (Dias)

# p

artic

ipan

tes

Marcos Kalinowski 142Experimental Software Engineering

Page 143: Experimental Software Engineering

Descriptive Statistics

n

xi

1

)(2

2

n

xxi

2

Mean:

There are other measures (such as kurtosis, asymmetry, geometric mean, ...) but out of scope of our discussion

Variance:

Standard Deviation:

Marcos Kalinowski 143Experimental Software Engineering

Page 144: Experimental Software Engineering

Frequency Distributions

• As seen, histograms can represent data in the frequency domain

• The histograms allow verifying if the data distribution follow a classical distribution, such as normal, uniform, beta, among others.

• The normal distribution, in particular, is important for some statistical tests, which require that analyzed data follow a normal distribution.

Marcos Kalinowski 144Experimental Software Engineering

Page 145: Experimental Software Engineering

• The normal distribution has a bell format, with the left and right limits extending from the central point. – The curve is symmetric in relation to its mean and its width is

proportional to its standard deviation – In this way, the curve can be defined by its mean and standard

deviation.

Frequency Distributions

Marcos Kalinowski 145Experimental Software Engineering

Page 146: Experimental Software Engineering

Normal Distribution

• If a numerical data set follows the normal distribution, it is possible to claim: – 68% of all observations are within the mean +- standard deviation

– 95,5% of all observations are within the mean +- 2* standard deviation

– 99,7% of all observation are within the mean +- 3* standard deviation

Marcos Kalinowski 146Experimental Software Engineering

Page 147: Experimental Software Engineering

Measures of Dependency

• When two or more variables are related, it can be useful to calculate the dependency level among them.

• The Measures of Dependency define the strength and direction of the relationship among two or more variables when quantitatively evaluated.

– The most used Measure of Dependency is CORRELATION

– Correlation between two variables is represented by a number

– Correlation among more than two variables is represented trough a correlation matrix

Marcos Kalinowski 147Experimental Software Engineering

Page 148: Experimental Software Engineering

Measures of Dependency

The CORRELATION between two variables range from -1 to 1

The correlation -1 indicates that a high value in one variable corresponds to a low value in the other one

The correlation 1 indicates that a high value in one variable corresponds to a high value in the other one

The correlation near 0 (or 0) indicates there is no way to infer the relationship behavior

CAUTION: just CORRELATION is not the CAUSE!

Marcos Kalinowski 148Experimental Software Engineering

Page 149: Experimental Software Engineering

Marcos Kalinowski 149Experimental Software Engineering

Page 150: Experimental Software Engineering

Pearson Correlation

• Most common correlation coefficient

– Quantifies the linear association strength between two variables and describes how much a straight line could be adjusted to fit these points

– The coefficient assumes the data distribution is normal

• Due the normal distribution, its condition can be indicated by the elliptical cloud formation in the dispersion graphic showing these values

Marcos Kalinowski 150Experimental Software Engineering

Page 151: Experimental Software Engineering

Measures of Dependency

0,0

2,0

4,0

6,0

8,0

10,0

12,0

14,0

16,0

18,0

20,0

0,0 5,0 10,0 15,0 20,0

A

B

CORREL(A,B) = 0,02

0,0

50,0

100,0

150,0

200,0

250,0

0,0 5,0 10,0 15,0 20,0 25,0

A

B

CORREL(A,B) = 0,98

0,0

50,0

100,0

150,0

200,0

250,0

0,0 5,0 10,0 15,0 20,0 25,0

A

B

CORREL(A,B) = -0,98

Marcos Kalinowski 151Experimental Software Engineering

Page 152: Experimental Software Engineering

Spearman Correlation

• Represents other example for a coefficient of correlation

– This method is based on the ranking of the collected values and not on the values

– In this way, it can be also used for variables in ordinal scales

• It can also be used when the distribution is not normal

Marcos Kalinowski 152Experimental Software Engineering

Page 153: Experimental Software Engineering

Regression Analysis

• Regression analysis extends the capacity of representation of dependency, providing an equation to describe the nature of the relationship.

• In simple regression analysis, the interest is in predict the value of dependent variables based on the values of independent variables

Versões

3020100

Núm

ero

de D

efei

tos

60

50

40

30

20

10

0

-10

Observado

Linear

Marcos Kalinowski 153Experimental Software Engineering

Page 154: Experimental Software Engineering

Descriptive Statistics

• Accordingly the variable scales, we can calculate

Scale Nominal Ordinal Interval Ratio

Mean X X

Median X X X

Mode X* X X X

Range X X X

Variance X X

Standard Deviation X X

Corr Pearson X X

Corr Spearman X X X

* Remember restrictions for nominal scale!

Marcos Kalinowski 154Experimental Software Engineering

Page 155: Experimental Software Engineering

Outliers Analysis

• Concepts

• Conditions of Occurrence

• Visual Identification

• Numerical Identification

Marcos Kalinowski 155Experimental Software Engineering

Page 156: Experimental Software Engineering

Outliers Removal

• Extreme values (outliers) represent observed values that are too distant from the other data set values

– They can represent data set error and usually must be removed before statistics

– They can occur due problems in the study execution, typing, interpretation or participants` motivation

– It is important to verify the origins of each outlier, because they can represent valid observations and that should be kept in the data set (false positives)

Marcos Kalinowski 156Experimental Software Engineering

Page 157: Experimental Software Engineering

Visual Identification

• Outliers can be visually identified, through dispersion graphics or box-plots– Box-plots diagrams were idealized to show the distribution of quantitative

data– They make use of measures of central tendency and dispersion to

characterize the distribution

Maximum Value

Median

3rd Quartile (mean+ X standard deviation)

1st Quartile (mean- X standard deviation)

Minimum Value

Marcos Kalinowski 157Experimental Software Engineering

Page 158: Experimental Software Engineering

Numerical Identification

• Outliers Removal Methods usually remove values that present a upper distance value from the mean or median

– Values near the limits not necessarily need to be removed from the data set (subjectivity)

– The distance is usually determined as one quartile, one percentile or a specified number of standard deviations• Quartile Method

– Lower Outliers: Q1 - 1.5*IQ

– Upper Outliers: Q3 + 1.5*IQ

– Where IQ = Q3 – Q1.

Marcos Kalinowski 158Experimental Software Engineering

Page 159: Experimental Software Engineering

Numerical Identification

• Removing outliers of the percentage of right requirements found by the participants that applied the techniques X and Y

Participant Time (days)% right

requirements

1 10 83%2 13 73%3 12 87%4 13 78%5 10 74%6 14 74%7 14 87%8 13 75%9 14 86%

10 14 82%11 13 77%

Measure Value

Minimum 73%

Mean– 1sd 74%

Mean 80%

Mean+ 1sd 86%

Maximum 87%

Marcos Kalinowski 159Experimental Software Engineering

Page 160: Experimental Software Engineering

Hypothesis Testing

• Experimental Studies Types

• Hypothesis Testing

• Erros, power and p-value

• Hypothesis Testing Types

• T-test

• Mann-Whitney

• ANOVA, Tukey

• Kruskal-Wallis

Marcos Kalinowski 160Experimental Software Engineering

Page 161: Experimental Software Engineering

Experimental Studies Types

Hypothesis Testing

Relationship Exploration

NormalDistribution Data

Non NormalDistributionData

2 groups

3+ groups

t-testpaired Student's t-test

ANOVA, Tukey

NormalDistributionData

Non NormalDistributionData

PearsonLinear Regression

SpearmanNon-Linear Regression

2 groups

3+ groups

Mann-Whitney (Wilcoxon rank-sum test)Wilcoxon signed-rank test

Kruskal-Wallis

Marcos Kalinowski 161Experimental Software Engineering

Page 162: Experimental Software Engineering

Hypothesis Testing

• As seen, an experimental study aims at to collect data to confirm or refute a hypothesis

• In general, two hypotheses are defined: – Null Hypothesis(H0): indicates the observed differences are

coincidental. It means that this is the hypothesis the researcher would like most to reject with high confidence

– Alternative Hypothesis(H1): represents the hypothesis inverse to the null one, that can be accepted, or tested.

• Statistics tests allow the acceptation or rejection of hypotheses

Marcos Kalinowski 162Experimental Software Engineering

Page 163: Experimental Software Engineering

Hypothesis Testing• In general, the Software Engineering tests compare

the mean between different groups of participants applying different treatments

“Developers using the technique Y can conclude the task of requirements analysis in less time and produce a more complete requirements set than when using the technique X”

Null Hypothesis: (TimeY) = (TimeX)

Alternative Hypothesis: (TimeY) (TimeX)

Marcos Kalinowski 163Experimental Software Engineering

Page 164: Experimental Software Engineering

Types of Errors

• The verification of hypothesis always deal with some risk, implying that some analysis error can happen

– Type I (): it happens when the statics test indicates the existence of a relationship between cause and effect that actually does not exist

– Type II (): it happens when the statistical test does not indicate a relationship between cause and effect that actually does exist

= P (error-type-I) = P (H0 is rejected| H0 is true)

= P (error-type-II) = P (H0 is not rejected| H0 is false)

Marcos Kalinowski 164Experimental Software Engineering

Page 165: Experimental Software Engineering

• The null hypothesis is usually built to minimize type I errors

– Consider:

• H0: medicine A = medicine B

• H1: medicine A is better than medicine B

– Errors:

• Type I: medicine A is better than B, but it is not true (they are equal)

• Type II: medicine A is equal to medicine B, but this is not true (A is better)

Types of Errors

Marcos Kalinowski 165Experimental Software Engineering

Page 166: Experimental Software Engineering

Power of Testing

• Indicates the probability of rejecting the null hypothesis when it is false, it means, the probability of correctly making the decision based on the alternative hypothesis– The size of error depends on the power of testing

– The power of testing implies in the probability the test can find the relationship when the null hypothesis is false

– The statistical testing with highest power shall be used to evaluate the hypothesis.

Power = 1 -

Power= P (H0 rejected | Ho is false)

Marcos Kalinowski 166Experimental Software Engineering

Page 167: Experimental Software Engineering

Significance Level

• Shows the likelihood of an type-I error to happen

– Most common significance level (): 10%, 5%, 1% and 0.1%

– We call p-value the lower level of significance that can be used to reject the null hypothesis

– We say there is statistical significance when the calculated p-value is lower than the adopted significance level

– For instance, when p=0.0001 one can say that the result is really significant, because this value is much lower than the usually used significance levels.

– However, if p=0.048 then one can not be sure. Although the value is lower than 5%, it is really closed to this significance level.

Marcos Kalinowski 167Experimental Software Engineering

Page 168: Experimental Software Engineering

• The decision-making process for a hypothesis test can be based on the probability value (p-value) for the given test:

– If the p-value is less than or equal to a predetermined level of significance (α-level), then you reject the null hypothesis and claim support for the alternative hypothesis

– If the p-value is greater than the α-level, you fail to reject the null hypothesis and cannot claim support for the alternative hypothesis

Significance Level

Marcos Kalinowski 168Experimental Software Engineering

Page 169: Experimental Software Engineering

H0 acceptation Rejection of H0

Significance Level

Marcos Kalinowski 169Experimental Software Engineering

Page 170: Experimental Software Engineering

Degrees of Freedom

• Degrees of freedom (DF) is the amount information (variables) free to be used for the calculation of a statistic (formula)

• The number of independent values to be used in the estimation of a parameter

• In general, the number of degrees of freedom of an estimate is equal to the number of values used in estimating the least number of estimated parameters in the intermediate calculation for obtaining it

• So to calculate the mean of a sample size "n" are necessary as "n" observations so that this statistic has "n" degrees of freedom

• The estimation of variance using a sample size "n" will have "n - 1" degrees of freedom as to obtain the sample variance is necessary before the calculation of the sample mean

Marcos Kalinowski 170Experimental Software Engineering

Page 171: Experimental Software Engineering

Hypothesis Testing Types

• The tests of hypothesis can be parametric and non parametric

• Parametric Tests

– They use specific formulas, derived from know distribution frequencies. Therefore, the data set that will be tested must present a distribution:

• Normal: symmetric distribution

AND

• Homocedastic: the variance is constant

Marcos Kalinowski 171Experimental Software Engineering

Page 172: Experimental Software Engineering

• Non-Parametric Tests

– Shall be used when the data distribution does not attend the

parametric tests requirements (normality, homocedastity)

– They are less powerful than parametric tests, and do not

assume any probability distribution in the data

– They use ranking of values instead the values

Hypothesis Testing Types

Marcos Kalinowski 172Experimental Software Engineering

Page 173: Experimental Software Engineering

Normality

• Frequency Distribution Graphs of the normal curve (in blue) and some hypotethical data (red vertical lines)

Data with distribution similar to normal

Data with non normal distribution

Marcos Kalinowski 173Experimental Software Engineering

Page 174: Experimental Software Engineering

Normality Testing

• Kolmogorov-Smirnov (K-S) testing

– Evaluates if two samples have similar distributions or

one sample presents a distribution similar to normal

– Frequently used to identify normality in samples with

at least 30 values!

– Detects differences related to the central tendency,

dispersion and symmetry, but is really sensitive to

long tails (high value of standard deviations)

Marcos Kalinowski 174Experimental Software Engineering

Page 175: Experimental Software Engineering

• Shapiro-Wilk testing

– Calculates the W value, indicating if the sample xi

follows the normal distribution

– Frequently used to identifiy normality in samples withless than 50 values

– Small S-W values indicate the distribution is notnormal

– Test used in small data sets, where the extremevalues can make hard to use K-S

Normality Testing

Marcos Kalinowski 175Experimental Software Engineering

Page 176: Experimental Software Engineering

Homocedasticity

• A set of variables is homocedastic if the variables have similar variances – A classical example of lack of homocedasticity is the relationship between

the type of consumed food and salary:

• As much as the salary of a person increases, the variety of food types the person can consume also increases

• A poor person usually spends a constant value in food, consuming similar products

• A more rich person eventually can consume more simple products, but can also consume more sophisticated products

• In this way, the richer a person is the more types of food it can consume

Marcos Kalinowski 176Experimental Software Engineering

Page 177: Experimental Software Engineering

• Observed values for an hypothetical study, showing heterocesdaticity between two groups

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Muito

Alto

Alto

Médio

Baix

o

Muito

Baix

o 0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Muito

Alto

Alto

Médio

Baix

o

Muito

Baix

o

Group I Group II

Homocedasticity

Marcos Kalinowski 177Experimental Software Engineering

Page 178: Experimental Software Engineering

• Levene’s Testing

– Consider a variable Y, with N distinct values divided inK groups, where Ni is the number of values in group i

– The Levene’s Testing accepts the hypothesis that thevariances are homogeneous whether the S-W (or K-S)value is less than the value of significance level

Homocedasticity Testing

Marcos Kalinowski 178Experimental Software Engineering

Page 179: Experimental Software Engineering

Hypotesis Testing Types

Experimental Design Parametric Test Non-Parametric Test

One factor, one treatment - Binomial

One factor, two random treatments

T Test Mann-Whitney

One factor, two paired treatments

Paired-T Test Wilcoxon signed rank

One factor, more than two treatments

ANOVA Kruskal-Wallis

Marcos Kalinowski 179Experimental Software Engineering

Page 180: Experimental Software Engineering

T Test or Student-T

• Parametric test used to compare two means from twoindependent samples– Relates to a category of tests, which different tests can be

applied accordingly the sample variances (homocedastic ornot)

– Different test are also applied whether the samples areindependents or paired

– We can say two samples are paired ones when it does exist aunique relation between a value in one sample with a valuein the other one.• Example: a sample before training and a sample after training

– All T tests assume normal distribution for the datadistribution

Marcos Kalinowski 180Experimental Software Engineering

Page 181: Experimental Software Engineering

Mann-Whitney Test

• Represents a non parametric alternative for T Test

– Requires the samples are independent, with continuous datain scales ordinal, interval or ratio

– To accomplish comparison, the samples are grouped andordered

– The samples are transformed in rankings into the group andthe sum of the smaller sample (T) in the group is calculated

– Finally, the statistic value is calculated and compared with atable of values

Marcos Kalinowski 181Experimental Software Engineering

Page 182: Experimental Software Engineering

ANOVA – Variance Analysis

• Statistical technique aiming at testing the equality betweentwo or more groups means

– Allows the comparison of means from different treatments, beingused as an extension for T test

– Evaluates whether the variability within the groups is greater thanthat existent among the groups

– The technique assumes independency, normality andhomocedastity of the group samples.

• As its aim is to evaluate if the means are equal, independentfrom the factor, the null ANOVA hypothesis establishes thatthe factor dependent variations must be equal zero.

Marcos Kalinowski 182Experimental Software Engineering

Page 183: Experimental Software Engineering

Tukey Test

• Test to compare means

– Usually used when ANOVA identify difference among means from multiples samples

• ANOVA shows the means are different, but does not show which ones are different

– The Tukey test supports the identification of the means that do differ

Marcos Kalinowski 183Experimental Software Engineering

Page 184: Experimental Software Engineering

Kruskal-Wallis Test

• A non parametric alternative to variance analysis

– As most of non parametric tests, this test is based onthe ranking of values rather the values bythemselves.

Marcos Kalinowski 184Experimental Software Engineering

Page 185: Experimental Software Engineering

Experimental Software Engineering

Prof. Marcos [email protected]