1 Experimentation in Computer Science – Part 3. 2 Experimentation in Software Engineering ---...
-
Upload
alexia-newton -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Experimentation in Computer Science – Part 3. 2 Experimentation in Software Engineering ---...
1
Experimentation in Computer Science – Part 3
2
Experimentation in Software Engineering --- Outline
Empirical Strategies Measurement Experiment Process (Continued)
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
4
Experiment Planning:Overview
ContextSelection
HypothesisFormulation
VariablesSelection
Selection ofSubjects
ExperimentDesign
ExperimentOperation
ExperimentDefinition
ExperimentPlanning
Instrumen-tation
ValidityEvaluation
Experiment Planning:Instrumentation
Instrumentation types: Objects (e.g., specs, code) Guidelines (e.g., process descriptions, checklists,
tutorial documents) Measurement instruments (surveys, forms,
automated data collection tools)
Overall goal of instrumentation: facilitate its performance without affecting control (instrumentation must not affect outcomes)
Experiment Planning:Validity Evaluation
Threats to external validity concern the ability to generalize results outside the experimental setting
Threats to internal validity concern the ability to conclude that a causal effect exists between independent and dependent variables
Threats to construct validity concern the extent to which variables and measures accurately reflect the constructs under study.
Threats to conclusion validity concern issues that affect our ability to draw accurate statistical conclusions
Experiment Planning:Process and Threats Related
Causeconstruct
Effectconstruct
Treatment Outcome
Theory(hypothesis)
Observation
cause-effect construct
treatment-outcomeconstruct
Independent variable Dependent variable
Experiment Planning:Process and Threats Related
Causeconstruct
Effectconstruct
Treatment Outcome
Theory(hypothesis)
Observation
cause-effect construct
treatment-outcomeconstruct
Independent variable Dependent variable
external
construct construct
internalconclusion
Experiment Planning:Threats to External Validity
Population: subject population not representative of population we wish to generalize to
Place: experimental setting or materials not representative of setting we wish to generalize to
Time: experiment is conducted at a time that affects results
Reduce external validity threats in a given experiment by making environment as realistic as possible; however, reality is not homogenous, so important to report environment characterisitics.Reduce external validity threats long-term through replication.
Experiment Planning: Threats to Internal Validity
Instrumentation: measurement tools report inaccurately or affect results
Selection: groups selected are not equivalent Learning: subjects learn over the course of the
experiment, altering later results Mortality: subjects drop out of the experiment Social Effects: e.g., control group resents
treatment group (demoralization or rivalry)
Reduce internal threats through careful experiment design.
Experiment Planning: Threats to Construct Validity
Inadequate preoperational explication of constructs: theory isn’t clear enough (e.g. what is “better”)
Mono-operation or mono-method bias: using a single independent variable, case, subject, treatment, or measure may under-represent constructs
Levels of constructs: using incorrect levels of constructs may confound presence of construct with its level
Integration of testing and treatment: testing itself makes subjects sensitive to treatment; test is part of treatment
Social effects: experimenter expectancy, evaluation apprehension, hypothesis guessing
Reduce construct threats through careful design, and replication.
Experiment Planning: Threats to Conclusion Validity
Low statistical power: increases risk of being unable to reject a false null hypothesis
Violated assumptions of statistical tests: some tests have assumptions, e.g. about normally distributed and independent samples
Fishing: searching for a specific result causes analyses to not be independent, and researchers may influence results by seeking specific outcomes
Reliability of measures: if you can’t measure the result twice with equal outcomes, measures aren’t reliable
Reduce conclusion validity threats through careful design, andperhaps through consultation with statistical experts
Experiment Planning: Priorities Among Validity Threats
Decreasing some types of threats may cause others to increase. (E.g. using CS students increases group size, reduces heterogeneity, aids conclusion validity, reduces external validity.)
Tradeoffs need to be considered for type of study: Theory testing is more interested in internal and construct validity
than external Applied experimentation is more interested in external and
possibly conclusion validity
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
15
Experiment Operation:Overview
Experiment operation: carrying out the actual experiment and collecting data
Three phases: Preparation Execution Data validation
16
Experiment Operation:Preparation
Locate participants Offer inducements to obtain participants Obtain participant consent, maybe also IRB approval Consider confidentiality (maintain it, inform
participants about it) Avoid deception where it affects participants, reveal it
later discussing necessity (beware validity tradeoffs; providing information is good but may affect results)
Prepare instrumentation Objects, guidelines, tools, forms Use pilot studies and walkthroughs to reduce threats
17
Experiment Operation:Execution
Execution might take place over a small set of specified occasions, or across a long time span
Data collection takes place: subjects or interviewers fill out forms, tools collect metrics
Consider interaction between experiment and environment, e.g., if experiment is being performed in-vivo, watch for confounding effects (experiment process altering behavior)
18
Experiment Operation:Data Validation
Verify that data has been collected correctly Verify that data is reasonable Consider whether outliers exist and should be
removed (must be for good reasons) Verify that experiment was conducted as
intended Post-experiment questionnaires can assess
whether subjects understood instructions
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
20
Analysis and Interpretation:Overview
Quantitative interpretation can include: Descriptive statistics: describe and graphically
present data set, used before hypothesis testing to better understand data and identify outliers
Data set reduction: locate and possibly remove anomalous data points
Hypothesis testing: apply statistical tests to determine whether the null hypothesis can be rejected
21
Analysis and Interpretation:Visualizing Data Sets
Graphs are effective ways to provide an overview of a data set
Basic graphs types for use in visualization: Scatter plots Box plots Line plots Bar charts Cumulative bar charts Pie charts
22
Analysis and Interpretation:Data Set Reduction
Hypothesis testing techniques depend on quality of data set; data set reduction improves data set quality by removing anomalous data (outliers)
Outliers can be removed, but only for reasons such as that they represent rare events not likely to occur again Scatter plots can help find outliers Statistical tests can determine probabilities that points are outliers
Sometimes redundant data is not easily analyzed, if the redundancy is too large; factor analysis and principal components analysis can identify orthogonal factors with which to replace redundant factors
23
Analysis and Interpretation:Hypothesis Testing
Hypothesis testing: can we reject H0? If statistical tests say we can’t, we draw no conclusions If tests say we can, H0 is false with a given significance
= P(type-I-error) = P(reject H0 | H0 is true).
We also calculate p-value : the lowest possible significance with which we can reject H0
Typically, is 0.05; to claim significance must be <
24
Analysis and Interpretation:Statistical Tests per Design
Design Parametric Non-parametric
One factor, one treatment Chi-2
Binomial test
One factor, two treatments, completely randomized
t-test
f-test
Mann-Whitney
Chi-2
One factor, two treatments, paired comparison
paired t-test Wilcoxon
Sign test
One factor, more than two treatments
ANOVA Kruskal-Wallis
Chi-2
More than one factor ANOVA
25
Analysis and Interpretation:Statistical Tests
Important to choose the right test - type of data must be appropriate are data items paired or not? is data normally distributed or not? are data sets completely independent or not?
Take a stats course, see texts such as Montgomery, consult with statisticians, use statistical packages
26
Analysis and Interpretation:Statistical vs Practical Significance
Statistical significance does not imply practical importance. E.g. if T1 is shown with statistical significance to be 1% more effective than T2, it must still be decided whether 1% matters
Lack of statistical significance does not imply lack of practical importance. The fact that H0 cannot be rejected at level does not mean that H0 is true, and results of high practical importance may justify using a lower
E
Experiment Process:Phases
ExperimentDefinition
ExperimentPlanning
ExperimentOperation
Analysis &Interpretation
Presentation& Package Conclusions
ExperimentIdea
ExperimentProcess
28
Presentation:An Outline for an Experiment Report
1. Introduction, Motivation
2. Background, Prior Work
3. Empirical Study3.0 Research Questions
3.1 Objects of analysis
3.1.1 participants
3.1.2 objects
3.2 Variables and measures 3.2.1 independent variables 3.2.2 dependent variables 3.2.3 other factors
3.3 Experiment setup
3.3.1 setup details 3.3.2 operational details
3.4 Analysis strategy 3.5 Threats to validity 3.6 Data and analysis
4. Interpretation5. Conclusions
Presentation Issues
• Supporting replicability.• What to say and what not to say?• How much to say?• Describing design decisions