4 Empirical

7/30/2019 4 Empirical

1/25

RESEARCH METHODS IN COMPUTER SCIENCE

NORDIN ABU BAKAR 2011 1

Research Design : An Empirical Approach

Introduction

Computer science ,like other scientific disciplines, needs specific methods and tests to

justify the results that have been produced in the research. There are a lot of research in

CS that leave this statistical part out of their discussion and put the demonstration of the

prototype as the ultimate justification of the findings. In order for the research to be

widely accepted, CS researchers must accompany their results with some justifications

to prove that the numbers are valid and correct. There are a number of approaches that

can be used to carry out research in CS; empirical approach is one of them. Empirical

approach uses statistics as one of the ways to analyse the findings or test hypotheses.

Analysing the data using statistics would require researchers to understand some basic

notions in statistics as this chapter is about to explore.

There are several reasons why statistical analysis is relevant in computer science;

The analysis explains the results in a common platform that everyone will

understand..

It is an explanation of the situation being studied. This understanding gives a

clearer picture that may have not been understood before.

Measures whether the research has been successfully executed.

Answers the research questions.

Evaluate the topic currently being investigated whether there is more to it or it is

a dead end and researcher should find another way to understand the situation.

The fact that statistical analysis is so hard to work on is just another myth and if properly

handled and patiently studied will benefit the researcher in the long run. The knowledge

gained after each analysis process will remain with the researcher to tackle the next

problem. It grows and after each research , a researcher will feel more confident and try

to get the most out of the process.


2/25



Level of Measurement

Level of measurement

Nominal scale

Interval scale

Ratio scale

Ordinal scale

All tests

Chi sq test

Non-parametrictest

t-test/F-test


3/25



Nominal Scale

The word nominal means to name. This is used in statistics to be utilized in the

dataset so as they can properly represent the data. Numbers are assigned to variablesonly to classify or categorize them; such as : 1= male, 2=female. In this manner the

data set can be easily manipulated and aid in data analysis. The only arithmetic that is

relevant for such group of data is counting. The statistical usage is very limited and

normally good for keeping track of people, objects & events. The common statistical test

that can be performed for this kind of data set is a chi-square test.

Ordinary Scale

Numbers or values are assigned to the objects or events to establish rank or order;

such as, 1st , 2nd, 3rd or 4th positions. Intervals of the scale are not equal, i.e., adjacent

ranks need not be equal in their differences. For data to be in this manner, there are no

more precise comparisons possible. The median is an appropriate measure of central

tendency. A percentile or quartile is used for measuring dispersion. Rank order

correlations are possible. The statistical tests that are possible for this data are non-

parametric tests. This is commonly used in qualitative research.

Ordinal level question

In which category was your income last year?

1. above RM100K

2. RM50KRM100K

3. Below RM50K

.


4/25



Interval Scale

The numbers or values assigned to the objects or events which can be categorized,

ordered and assumed to have equal distance between scale values. The usual

representation would be the test score or degrees of temperature : such as a fahrenheittemperature scale (72) degrees or test score (0 100). There is no absolute zero or

unique origin; only an arbitrary zero can be had and hence no capacity to measure the

complete absence of a trait or characteristic This type of data is more powerful than

ordinal scale due to the concept of equality of interval. The sample mean is an

appropriate measure of central tendency. Standard deviation(SD) is widely used for

dispersion. The common relevant statistical tests for this data are t-test & F-test for

significance.

Ratio Scale

The numbers in this data set represent the objects or events which can be categorized,

ordered and assumed to have equal distance between scale values and have a real zero

point. The values can be used for all statistical tests that conform with the requirements

for the particular test. The highest level of measurement ; All mathematical operations

and statistical techniques can be applied ;all manipulations that are possible with real

numbers can be carried out.

Ratio level question :

What was your income last year ? .


5/25



Figure 1 : Steps in The Planning Phase

Population

Population refers to the entire group of the subject to be studied. The size of the

population is very important to be determined due to the need of inference later on. The


6/25



validity of the results as a reference to the whole population depends onhow much data

are collected to infer the entire population.

Sampling

Sampling is a process to draw some elements from the population and analyse it. Since

sampling unit is the subgroup of the object under study, it must reflect the whole group

as uch as possible. Failing to do so will jeopardize the data, the results and the findings.

In sampling process, we need to define sampling frame and sampling methods. The

sampling frame represents the elements in a population where a sample is drawn. It

could be membership names, staff directories, registered students, zakat recipients,

licensed traders, and the likes. With clearly defined group, researcher can determine

who should be included and who should be excluded; thus, would minimize the amount

of error in the data. Once the sampling frame is determined, a researcher can select an

appropriate sampling method.

How to determine the sample size

The sample size must represent the entire population of the subjects being studied. In

order to avoid any error due to misrepresentation, the sample size for a simple random

must specify :

The level of confidence

The acceptable amount of errors

The values of the SD or proportion

Research Design

Research methodologies are commonly characterized by the research designs. Each

research design will specify the method used in the experiment to collect the data.

Research design is defined as a plan for conducting research which usually includes

specification of the elements to be examined and the methods used.


7/25



The research design will be selected to serve the purpose as the most suitable and

feasible methods for hypothesis testing or answering research questions. The diagram in

Figure 2 shows the different research designs.

Figure 2 : Research Designs

True Experimental Design

In experimental design, a specification for a research study is laid out to answer specific

research question such as - Does variable A cause variable B to increase in value?

The plan must include

Methods of selecting and assigning subjects

Number and types of variables.


8/25



The main purpose of the experiment is to apply some controls and study the cause and

effects of the variables on each other. The variables known as independent variables

might be assumed to cause the ones known as dependent variables. Thus, the

experiment will show with exact values whether or not the assumption is correct and

acceptable. Control mechanisms in experimental design play an important role towards

the validity as well as the reliability of the results. In order to make sure that the results

are valid, the researcher must be able to :

Have a good amount of data form at least two comparison groups.

Apply random selection

Manipulate the independent variable to apply different treatments.

The threats to validity in experimental design are :

1. The events which occur between the 1st and 2nd measurements.

2. The changes in the subject during the course of the experiment

3. Subjects might change their opinion after the first measurement. The second

measurement might be different due to this knowledge. Researcher or RA might

change the way they execute measurement.

Factorial Design

This research design is one of the true experimental whenever the research has more

than one independent variables.

Solomon 4 group Design

Pretest-Posttest Design

Posttest Only Design


9/25



Figure 3: Steps in The Action Phase


10/25


11/25


12/25



K Mean Mean_K

45,30,67,89,55,66,75,49,50,85 Total/Count 611/10 =61.1

The mean value represents the total picture of the group or list of numbers. It takes into

account all values in the group and this gives positive justification on why the mean can

represent the group. However, the mean can work well if all the values are well-

distributed; otherwise if there exists one odd value (called outliers) the mean could be

well off. The process to derive the mean also could be a constraint if the data is large.

The median is the middle number in a list. When there a list of nu mber and the

numbers are positioned in order; median will be the middle number if the count of the

numbers in the list is odd or the value in between two middle numbers if the count is

odd. For example ;

K Median Median_K

45,30,67,89,55,66,75,49,50,85 (Mid1+Mid2)/2 (55 + 66)/2 = 60.5

Compared to the mean, the median can resolve that problem whenever there is an odd

value in the data; because it does not affect the derivation of the median. The process is

also simple and easy to manipulate is it does not include any numerical computation. But

whenever there is no numerical computation, the value taken as the median is not

precise does not tell much about the data.

The mode is the number occurs most frequently in the list. For example ;

K Mode Mode_K

2,2,3,4,5,4,5,4,4,4 Most number occured 4

This mode process is quite simple and straight forward. Due to its simplicity, the mode

is very raw and could be questionable especially when there is not much difference in

the frequencies of the data.


13/25



Standard Deviation

Standard Deviation (SD) is the measure of dispersion from the mean. After a value for

mean has been determined, other data can be calculated for their distance from the

mean and this is called the standard deviation(SD). If the SD is large, this indicates that

the data are widely distributed with many values are far away from the mean. On the

other hand, if the SD is small, most values are very close to the mean. If the SD is zero

then there is no dispersion and all values are the same. The SD value will position the

data above or below the mean; and this can be used to evaluate that particular value

against the rest of the values in the data list.

Correlation

When the objective of the research is to find relationships between variable then

correlation analysis is inevitable. In computer science, this research element is also

present and very common. For example, the study might want to find the relationship

between parameter A and parameter B in an application.; the software errors and

testing procedures or programmers attitude or relationship between machine

architecture and speed of execution etc. If one variable scores as highly as the other

variable; then the relationship is referred to as positive correlation. On the other hand if

one variable scores highly but the other one scores low there is a negative correlation.

There are also some cases when relationships are scattered around and do not present

a cohesive trend; the relationship is deemed zero correlation.

In order to determine the type of correlation for the data set , a scatter graph will do the

job but for more precise numerical value a statistical test must be performed. Pearsons

and Spearmans rho are two example ofstatistical tests that can facilitate the correlation

analysis. The value produced from the tests is referred to as correlation co-efficient

which could be in the range of +1 and -1. The co-efficient that close to +1 has a strong

positive correlation, the one that close to -1 has a strong negative correlation and the

closer it is to 0 has a weak correlation.

Lets say after the test, the co-efficient between X and Y is 0.8(r = 0.8) and the co-

efficient between X and Z is 0.4 (r=0.4). The interpretation of correlation co-efficient is


14/25



quite tricky and not as straight forward as it may seem. A correlation between X and Y is

in fact 64% (100*r^2); meaning that 64% of the time whenever X changes, Y changes

as well. A correlation between X and Z then is 16% (100* r^2); meaning that 16% of the

time whenever X changes, Z will change as well. However , it is important to note that

correlation analysis does not tell us whether X causes Y to change or not; it just gives an

indication of correlated change exists. If one wants to know whether X causes Y or not ,

then a causal research should be employed where specific experiments are carried out

in the lab to find out whether X causes Y to change or not. This will lead us to another

interesting test in statistical analysis called hypothesis testing.

Correlation Analysis with SPSS

What is it?Very often researchers intend to visualize a situation when things are different. The

needs to explain the circumstances in a more rational manner will take us to find a

reasonable way to analyze and make a conclusion. Do students who are good in math

achieve higher CGPA and do those who are not good in math get lower CGPA? What

we are trying to understand is whether the variable good in math correlates with the

variable CGPA. If this was the case then we would say that there is a positive correlation

between the variables.

Positive correlation means as a score on one variable increases so thecorresponding score on the other variable does the same (in SPSS, the value will

be positive).

As in the above context, if the student is good in math his CPGA will be higher as

well.

There is a situation when we find a score on one variable goes up ,the score on the

other corresponding variable goes down. This is referred to as negative correlation (in

SPSS, the value in the table will be negative). One example of such correlation isbetween weight of a person and the health. As the weight goes up the less healthy that

person tends to be.


15/25



What is it for?

Measures the strength and direction of linear relationship between a pair of variables. If

we have more than two variables, then we need a multivariate analysis.

How to use it?

Using SPSS, the steps to follow are as follows :

Correspondence dialogue box will appear

Select the variables to correlate and move them to the variables box.

Choose the appropriate correlations coefficients

o Interval data _> Pearson

o Ordinal data _> Spearman

Select one-tailed or two-tailed test

o One-tailed -> direction of the relation is known

o Two-tailed -> direction is unknown

Ok

Sample output is as follows :

Variable A Variable B Variable C

Pearson

Correlations

Variable A

Variable B

Variable C

1.000

0.690*

0.840*

0.690*

1.000

0.750*

0.840*

0.750*

1.000

Sig.(2-tailed)

Variable A

Variable B

Variable C

.

0.002

0.005

0.002

.

0.000

0.005

0.000

.

Statistics Correlate Bivariate


16/25



N

Variable A

Variable B

Variable C

20

20

20

20

20

20

20

20

20

*correlation is significant at the 0.01 level(2-tailed)

Testing the hypothesis

Hypothesis is an assumption that one can make for an impact that happened on certain

variables. For example,

1. Parameter A is better than parameter B.

2. Selection sort performs faster than Binary sort

3. Cost affects software performance

Using statistical techniques the outcome of this process whether the hypothesis is

accepted or rejected can be properly justified and supported. A few steps to exercise

hypothesis testing is as follows :

1. Choose a null hypothesis. Make an opposite assumption on a vital variable of

your study. If the study wants to prove A, choose B (opposite of A)as a null

hypothesis. This is a trick when B is rejected, A will be statistically true.

2. Choose an alternative hypothesis that can be accepted in case the original

hypothesis (in 1) is rejected. This is A (as in 1) and in case B is rejected, this can

be accepted even though there is no evident that it is true.

3. Make a condition when to reject the hypothesis and when not to reject it.

4. Draw a sample in random and select a statistical method.

5. Based on the test, choose to reject or not to reject null hypothesis.

6. Pick alternative hypothesis if null hypothesis is rejected.


17/25



Hypothesis Testings

To describe the process of hypothesis testing we feel that we cannot do better than

following the five-step method introduced by Neave(1976a as appeared Kanji (1999)):

Step 1

Formulate the practical problems in terms of hypotheses. A focus should go

into creating an alternative hypothesis,Ha, since this is more important from

practical point of view. This should express the range of situations that we wish

the test to be able to diagnose.In thissense a positive test can indicate that we

should take action of some kind. Once this is fixed, it should be obvious whether

we carry out a one- or two-tailed test. The null hypothesis ,H0, needs to be to be

very simple and represents the status quo, i.e., there is no difference between

the processes being tested. It is basically a standard or control with which the

evidence pointing to the alternative can be compared.

Step 2

Calculate a statistic (T), a function purely of the data All good test statistics

should have two properties : (a) they should tend to behave differently when H0 is

true from when Ha is true; and (b) their probability distribution should be calculable

under the assumption the H0 is true. It is also desirable that tables of this probability

distribution should exist.

Step 3

Choose a critical region. One should decide on the kind of values of T which will

most strongly point to Ha being true rather than H0 being true. A value for T lying

in a suitably defined critical region will lead us to reject H0 in favour of H1; if T

lies outside the critical region we do not reject H0. We should never conclude by

accepting H0.

Step 4


18/25



Decide the size of the critical region. This involves specifying how great a risk we

are prepared to run of coming to an incorrect conclusion.


19/25



Chi-Square Test

What is it?

The data that have been collected need to be processed and analyzed. If the data is of

type non-quantitative which is not numerical but some criteria such as sex and having a

headache, then chi-square test can be used for the process. Is there a connection

between these two criteria ?, we may ask in the research. In statistics, this is called

measures of associations. The research is looking into the associations between two

variables which are not numbers in nature.

What is it for ?

Measures of associations between two variables

Level of distribution in the population

How to use it?

The test can be used if a table of frequency can be produced. Using SPSS, the following

steps will derive some results.

SPSS will pop up a dialogue window

Select the appropriate variables Click on Statistics to choose the appropriate testchi-square.

Ok

Analyze Summarize Crosstabs


20/25



The results may be as follows :

Value df Asymp.

Sig.

(2-sided)

Pearson

Chi-square

43.617a 4 .000

Likelihood Ratio 46.826 4 .000

Linear-by-Linear

Association

41.263 1 .000

N of valid Cases 250

a0 cells (.0%) have expected count less tan 5. The minimum expected count is 12.10.

Column 2 gives the value of the test

Column 3 states the degree of freedom (df)

Column 4 gives the indication that the results is significant or not.

Take these values and compare the Pearson Chi-square value against the table with (df

= 4) and (level of significant = 0.05) . If this value is less than the one in the table, then

null hypothesis (H0) will be rejected. Therefore, alternative hypothesis will be true.


21/25



Confidence Interval

When the measurements have been collected from the experiments or tests , there is a

set of numerical values that need to be analysed. Some relevant questions are ;

1. how certain can we be of the values ?

2. If there is a set of two values; how certain can we be that the two set of values

are different ?

The need to answer these questions will lead us to some more of statistical analysis and

its vital role in computer science research. Lets say we have the mean sample as equal

to 0.248. So what? What does it mean? Is it good or bad? There must be a way to justify

this value and put some kind of evaluation criteria to the number. How confident that the

value is a true mean ? In other words , it is to come up with a confidence interval that will

specify the value as follows : if a sample of 40 was drawn and the mean calculated, 95%

of the time, the mean would lie in between a lowerbound(lb) and an upperbound(ub) in

such a way that lb < Mk < ub. The bootstrap method is suitable to facilitate this process

and it is done in the following steps :

Choose 1000 random sample (with replacement) of size 40 from our original 40

points.

Take the mean of each sample.

Sort and take the value at the 25th and 975th positions.

The lower bound is 0.2451 and the upper bound is 0.2505. Since

0.2451 < 0.248 < 0.2505

The calculated mean can be accepted as the true mean of the said population above.


22/25



The Bootstrap Method

The bootstrap method is an attractive procedure for CS researchers as it offers a perfect

getaway from the usual complicated and annoying statistical procedures. This suits the

CS environment nicely due to flexibility in the sampling criteria. The basic samples of

data used to find the confidence intervals have distributions which depart from the

traditional parametric distribution. The constraints in producing enough data for any

statistical procedures are very common in CS research; and many have resorted to stay

away from using any statistical analysis. The bootstrap method gives an opportunity to

produce statistically reliable analysis regardless of the form of the data probability

density.; it makes no assumption about the different data distributions. Probably the

main definitive point regarding the bootstrap method is that the entire sampling

distribution is estimated by relying on the fact that the samples distribution is a perfect

estimate of the population distribution. On the other hand, the traditional parametric

inference depends on the assumption that the sample and the population are normally

distributed.

The bootstrap method was initially proposed by Efron in 1979. He used Monte Carlo

sampling to generate an empirical estimate of the sampling distribution. Monte Carlo

sampling builds an estimate of the sampling distribution by randomly drawing a large

number of samples of size k from a population, and calculating for each one the

associated value of the statistics. The relative frequency distribution of these values is

an estimate of the sampling distribution for that statistic.

The procedure

The generic bootstrap method has the following basic ideas as presented by (Efron and

Tibshirani (1994)) ;

A bootstrap sample is a sample composed by ) x , , x , (x x n 2 1 * * * *= _ that is

obtained in a random form with repositioning from the experimental sample ) x , , x , (x x

n 2 1 _ = , also designated bootstrap population. Here, the asterisk denotes that * x is a

randomized version, or resampling of x, rather than a new group of actual data. The

bootstrap sampling consists of corresponding members of x. For each bootstrap


23/25



procedure one should carry out a random resampling by sampling with replacement

using the n elements from the experimental sample, which will be employed as parent

population. Thus, the arithmetic mean * i x is reached using the follow equation 1. After a

number m of resamplings, the arithmetic bootstrap mean * m x is obtained by equation

2, with standard deviation given from the equation 3.

The bootstrap distribution of probability is a result of the sequence bellow: In practice,

the bootstrap distribution is built form the Monte-Carlo Method with a number of

repetitions, for a sufficiently large m. In this case, the bootstrap mean approximates the

mean of population and the distribution tends to a normal one (Manly, 1997). The

convergence is guaranteed by the great numbers law, because, ) x , , x , (x n 2 1 * * * _

are nothing more than a sample of independent random variables and are identically

distributed.

An implementation of the bootstrap method in C++ is as follows ;

#include

get_data(); //put the variable of interest in the first n elements of the array X[].

randomize(); //initializes random number generator

for (i=0; i


24/25



Conclusion

Statistical analysis can enhance the findings that have been produced in the research.

The important part is to make sure that the results are well understood to assist in the

evaluation of the whole research project. This is instrumental to CS researchers so that

the system or application being produced as the outcome of this research is free from

experimental flaws or software bugs. A strong justification of the parameters used,

methods chosen or techniques implemented can safeguard the development stage ,that

might follow after the research period, from any errors.

The bootstrap method discussed in this chapter is a brave diversion from the traditional

parametric inference that has improved analysis in many CS research. The method

works well in certain circumstances but behaves badly in others. It is good for normal

distribution but tend to be problematic for skewed distributions; so the use of such

method must be adopted in great care and clear understanding of the data.


25/25


References

Bradley Efron (1979). "Bootstrap methods: Another look at the jackknife", The

Annals of Statistics, 7, 1-26. Bradley Efron (1981). "Nonparametric estimates of standard error: The jackknife,

the bootstrap and other methods",Biometrika, 68, 589-599.

Bradley Efron (1982). The jackknife, the bootstrap, and other resampling plans,

In Society of Industrial and Applied Mathematics CBMS-NSF Monographs, 38.

P. Diaconis, Bradley Efron (1983), "Computer-intensive methods in statistics,"

Scientific American, May, 116-130.

Bradley Efron, Robert J. Tibshirani, (1993).An introduction to the bootstrap, New

York: Chapman & Hall, software. Davison, A. C. and Hinkley, D. V. (1997): Bootstrap Methods and their

Applications, software.

Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to

Statistical Inference. Sage University Paper series on Quantitative Applications in

the Social Sciences, 07-095. Newbury Park, CA: Sage.

Simon, J. L. (1997): Resampling: The New Statistics.

Good, P.I. Resampling Methods : A Practical Guide To Data Analysis, ISBN :

978-0-8176, Springer, 2005.
http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://www.resample.com/content/text/index.shtmlhttp://www.resample.com/content/text/index.shtmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efron

4 Empirical

Documents

Transcript of 4 Empirical