4 Empirical

download 4 Empirical

of 25

Transcript of 4 Empirical

  • 7/30/2019 4 Empirical

    1/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 1

    Research Design : An Empirical Approach

    Introduction

    Computer science ,like other scientific disciplines, needs specific methods and tests to

    justify the results that have been produced in the research. There are a lot of research in

    CS that leave this statistical part out of their discussion and put the demonstration of the

    prototype as the ultimate justification of the findings. In order for the research to be

    widely accepted, CS researchers must accompany their results with some justifications

    to prove that the numbers are valid and correct. There are a number of approaches that

    can be used to carry out research in CS; empirical approach is one of them. Empirical

    approach uses statistics as one of the ways to analyse the findings or test hypotheses.

    Analysing the data using statistics would require researchers to understand some basic

    notions in statistics as this chapter is about to explore.

    There are several reasons why statistical analysis is relevant in computer science;

    The analysis explains the results in a common platform that everyone will

    understand..

    It is an explanation of the situation being studied. This understanding gives a

    clearer picture that may have not been understood before.

    Measures whether the research has been successfully executed.

    Answers the research questions.

    Evaluate the topic currently being investigated whether there is more to it or it is

    a dead end and researcher should find another way to understand the situation.

    The fact that statistical analysis is so hard to work on is just another myth and if properly

    handled and patiently studied will benefit the researcher in the long run. The knowledge

    gained after each analysis process will remain with the researcher to tackle the next

    problem. It grows and after each research , a researcher will feel more confident and try

    to get the most out of the process.

  • 7/30/2019 4 Empirical

    2/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 2

    Level of Measurement

    Level of measurement

    Nominal scale

    Interval scale

    Ratio scale

    Ordinal scale

    All tests

    Chi sq test

    Non-parametrictest

    t-test/F-test

  • 7/30/2019 4 Empirical

    3/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 3

    Nominal Scale

    The word nominal means to name. This is used in statistics to be utilized in the

    dataset so as they can properly represent the data. Numbers are assigned to variablesonly to classify or categorize them; such as : 1= male, 2=female. In this manner the

    data set can be easily manipulated and aid in data analysis. The only arithmetic that is

    relevant for such group of data is counting. The statistical usage is very limited and

    normally good for keeping track of people, objects & events. The common statistical test

    that can be performed for this kind of data set is a chi-square test.

    Ordinary Scale

    Numbers or values are assigned to the objects or events to establish rank or order;

    such as, 1st , 2nd, 3rd or 4th positions. Intervals of the scale are not equal, i.e., adjacent

    ranks need not be equal in their differences. For data to be in this manner, there are no

    more precise comparisons possible. The median is an appropriate measure of central

    tendency. A percentile or quartile is used for measuring dispersion. Rank order

    correlations are possible. The statistical tests that are possible for this data are non-

    parametric tests. This is commonly used in qualitative research.

    Ordinal level question

    In which category was your income last year?

    1. above RM100K

    2. RM50KRM100K

    3. Below RM50K

    .

  • 7/30/2019 4 Empirical

    4/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 4

    Interval Scale

    The numbers or values assigned to the objects or events which can be categorized,

    ordered and assumed to have equal distance between scale values. The usual

    representation would be the test score or degrees of temperature : such as a fahrenheittemperature scale (72) degrees or test score (0 100). There is no absolute zero or

    unique origin; only an arbitrary zero can be had and hence no capacity to measure the

    complete absence of a trait or characteristic This type of data is more powerful than

    ordinal scale due to the concept of equality of interval. The sample mean is an

    appropriate measure of central tendency. Standard deviation(SD) is widely used for

    dispersion. The common relevant statistical tests for this data are t-test & F-test for

    significance.

    Ratio Scale

    The numbers in this data set represent the objects or events which can be categorized,

    ordered and assumed to have equal distance between scale values and have a real zero

    point. The values can be used for all statistical tests that conform with the requirements

    for the particular test. The highest level of measurement ; All mathematical operations

    and statistical techniques can be applied ;all manipulations that are possible with real

    numbers can be carried out.

    Ratio level question :

    What was your income last year ? .

  • 7/30/2019 4 Empirical

    5/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 5

    Figure 1 : Steps in The Planning Phase

    Population

    Population refers to the entire group of the subject to be studied. The size of the

    population is very important to be determined due to the need of inference later on. The

  • 7/30/2019 4 Empirical

    6/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 6

    validity of the results as a reference to the whole population depends onhow much data

    are collected to infer the entire population.

    Sampling

    Sampling is a process to draw some elements from the population and analyse it. Since

    sampling unit is the subgroup of the object under study, it must reflect the whole group

    as uch as possible. Failing to do so will jeopardize the data, the results and the findings.

    In sampling process, we need to define sampling frame and sampling methods. The

    sampling frame represents the elements in a population where a sample is drawn. It

    could be membership names, staff directories, registered students, zakat recipients,

    licensed traders, and the likes. With clearly defined group, researcher can determine

    who should be included and who should be excluded; thus, would minimize the amount

    of error in the data. Once the sampling frame is determined, a researcher can select an

    appropriate sampling method.

    How to determine the sample size

    The sample size must represent the entire population of the subjects being studied. In

    order to avoid any error due to misrepresentation, the sample size for a simple random

    must specify :

    The level of confidence

    The acceptable amount of errors

    The values of the SD or proportion

    Research Design

    Research methodologies are commonly characterized by the research designs. Each

    research design will specify the method used in the experiment to collect the data.

    Research design is defined as a plan for conducting research which usually includes

    specification of the elements to be examined and the methods used.

  • 7/30/2019 4 Empirical

    7/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 7

    The research design will be selected to serve the purpose as the most suitable and

    feasible methods for hypothesis testing or answering research questions. The diagram in

    Figure 2 shows the different research designs.

    Figure 2 : Research Designs

    True Experimental Design

    In experimental design, a specification for a research study is laid out to answer specific

    research question such as - Does variable A cause variable B to increase in value?

    The plan must include

    Methods of selecting and assigning subjects

    Number and types of variables.

  • 7/30/2019 4 Empirical

    8/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 8

    The main purpose of the experiment is to apply some controls and study the cause and

    effects of the variables on each other. The variables known as independent variables

    might be assumed to cause the ones known as dependent variables. Thus, the

    experiment will show with exact values whether or not the assumption is correct and

    acceptable. Control mechanisms in experimental design play an important role towards

    the validity as well as the reliability of the results. In order to make sure that the results

    are valid, the researcher must be able to :

    Have a good amount of data form at least two comparison groups.

    Apply random selection

    Manipulate the independent variable to apply different treatments.

    The threats to validity in experimental design are :

    1. The events which occur between the 1st and 2nd measurements.

    2. The changes in the subject during the course of the experiment

    3. Subjects might change their opinion after the first measurement. The second

    measurement might be different due to this knowledge. Researcher or RA might

    change the way they execute measurement.

    Factorial Design

    This research design is one of the true experimental whenever the research has more

    than one independent variables.

    Solomon 4 group Design

    Pretest-Posttest Design

    Posttest Only Design

  • 7/30/2019 4 Empirical

    9/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 9

    Figure 3: Steps in The Action Phase

  • 7/30/2019 4 Empirical

    10/25

  • 7/30/2019 4 Empirical

    11/25

  • 7/30/2019 4 Empirical

    12/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 12

    K Mean Mean_K

    45,30,67,89,55,66,75,49,50,85 Total/Count 611/10 =61.1

    The mean value represents the total picture of the group or list of numbers. It takes into

    account all values in the group and this gives positive justification on why the mean can

    represent the group. However, the mean can work well if all the values are well-

    distributed; otherwise if there exists one odd value (called outliers) the mean could be

    well off. The process to derive the mean also could be a constraint if the data is large.

    The median is the middle number in a list. When there a list of nu mber and the

    numbers are positioned in order; median will be the middle number if the count of the

    numbers in the list is odd or the value in between two middle numbers if the count is

    odd. For example ;

    K Median Median_K

    45,30,67,89,55,66,75,49,50,85 (Mid1+Mid2)/2 (55 + 66)/2 = 60.5

    Compared to the mean, the median can resolve that problem whenever there is an odd

    value in the data; because it does not affect the derivation of the median. The process is

    also simple and easy to manipulate is it does not include any numerical computation. But

    whenever there is no numerical computation, the value taken as the median is not

    precise does not tell much about the data.

    The mode is the number occurs most frequently in the list. For example ;

    K Mode Mode_K

    2,2,3,4,5,4,5,4,4,4 Most number occured 4

    This mode process is quite simple and straight forward. Due to its simplicity, the mode

    is very raw and could be questionable especially when there is not much difference in

    the frequencies of the data.

  • 7/30/2019 4 Empirical

    13/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 13

    Standard Deviation

    Standard Deviation (SD) is the measure of dispersion from the mean. After a value for

    mean has been determined, other data can be calculated for their distance from the

    mean and this is called the standard deviation(SD). If the SD is large, this indicates that

    the data are widely distributed with many values are far away from the mean. On the

    other hand, if the SD is small, most values are very close to the mean. If the SD is zero

    then there is no dispersion and all values are the same. The SD value will position the

    data above or below the mean; and this can be used to evaluate that particular value

    against the rest of the values in the data list.

    Correlation

    When the objective of the research is to find relationships between variable then

    correlation analysis is inevitable. In computer science, this research element is also

    present and very common. For example, the study might want to find the relationship

    between parameter A and parameter B in an application.; the software errors and

    testing procedures or programmers attitude or relationship between machine

    architecture and speed of execution etc. If one variable scores as highly as the other

    variable; then the relationship is referred to as positive correlation. On the other hand if

    one variable scores highly but the other one scores low there is a negative correlation.

    There are also some cases when relationships are scattered around and do not present

    a cohesive trend; the relationship is deemed zero correlation.

    In order to determine the type of correlation for the data set , a scatter graph will do the

    job but for more precise numerical value a statistical test must be performed. Pearsons

    and Spearmans rho are two example ofstatistical tests that can facilitate the correlation

    analysis. The value produced from the tests is referred to as correlation co-efficient

    which could be in the range of +1 and -1. The co-efficient that close to +1 has a strong

    positive correlation, the one that close to -1 has a strong negative correlation and the

    closer it is to 0 has a weak correlation.

    Lets say after the test, the co-efficient between X and Y is 0.8(r = 0.8) and the co-

    efficient between X and Z is 0.4 (r=0.4). The interpretation of correlation co-efficient is

  • 7/30/2019 4 Empirical

    14/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 14

    quite tricky and not as straight forward as it may seem. A correlation between X and Y is

    in fact 64% (100*r^2); meaning that 64% of the time whenever X changes, Y changes

    as well. A correlation between X and Z then is 16% (100* r^2); meaning that 16% of the

    time whenever X changes, Z will change as well. However , it is important to note that

    correlation analysis does not tell us whether X causes Y to change or not; it just gives an

    indication of correlated change exists. If one wants to know whether X causes Y or not ,

    then a causal research should be employed where specific experiments are carried out

    in the lab to find out whether X causes Y to change or not. This will lead us to another

    interesting test in statistical analysis called hypothesis testing.

    Correlation Analysis with SPSS

    What is it?Very often researchers intend to visualize a situation when things are different. The

    needs to explain the circumstances in a more rational manner will take us to find a

    reasonable way to analyze and make a conclusion. Do students who are good in math

    achieve higher CGPA and do those who are not good in math get lower CGPA? What

    we are trying to understand is whether the variable good in math correlates with the

    variable CGPA. If this was the case then we would say that there is a positive correlation

    between the variables.

    Positive correlation means as a score on one variable increases so thecorresponding score on the other variable does the same (in SPSS, the value will

    be positive).

    As in the above context, if the student is good in math his CPGA will be higher as

    well.

    There is a situation when we find a score on one variable goes up ,the score on the

    other corresponding variable goes down. This is referred to as negative correlation (in

    SPSS, the value in the table will be negative). One example of such correlation isbetween weight of a person and the health. As the weight goes up the less healthy that

    person tends to be.

  • 7/30/2019 4 Empirical

    15/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 15

    What is it for?

    Measures the strength and direction of linear relationship between a pair of variables. If

    we have more than two variables, then we need a multivariate analysis.

    How to use it?

    Using SPSS, the steps to follow are as follows :

    Correspondence dialogue box will appear

    Select the variables to correlate and move them to the variables box.

    Choose the appropriate correlations coefficients

    o Interval data _> Pearson

    o Ordinal data _> Spearman

    Select one-tailed or two-tailed test

    o One-tailed -> direction of the relation is known

    o Two-tailed -> direction is unknown

    Ok

    Sample output is as follows :

    Variable A Variable B Variable C

    Pearson

    Correlations

    Variable A

    Variable B

    Variable C

    1.000

    0.690*

    0.840*

    0.690*

    1.000

    0.750*

    0.840*

    0.750*

    1.000

    Sig.(2-tailed)

    Variable A

    Variable B

    Variable C

    .

    0.002

    0.005

    0.002

    .

    0.000

    0.005

    0.000

    .

    Statistics Correlate Bivariate

  • 7/30/2019 4 Empirical

    16/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 16

    N

    Variable A

    Variable B

    Variable C

    20

    20

    20

    20

    20

    20

    20

    20

    20

    *correlation is significant at the 0.01 level(2-tailed)

    Testing the hypothesis

    Hypothesis is an assumption that one can make for an impact that happened on certain

    variables. For example,

    1. Parameter A is better than parameter B.

    2. Selection sort performs faster than Binary sort

    3. Cost affects software performance

    Using statistical techniques the outcome of this process whether the hypothesis is

    accepted or rejected can be properly justified and supported. A few steps to exercise

    hypothesis testing is as follows :

    1. Choose a null hypothesis. Make an opposite assumption on a vital variable of

    your study. If the study wants to prove A, choose B (opposite of A)as a null

    hypothesis. This is a trick when B is rejected, A will be statistically true.

    2. Choose an alternative hypothesis that can be accepted in case the original

    hypothesis (in 1) is rejected. This is A (as in 1) and in case B is rejected, this can

    be accepted even though there is no evident that it is true.

    3. Make a condition when to reject the hypothesis and when not to reject it.

    4. Draw a sample in random and select a statistical method.

    5. Based on the test, choose to reject or not to reject null hypothesis.

    6. Pick alternative hypothesis if null hypothesis is rejected.

  • 7/30/2019 4 Empirical

    17/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 17

    Hypothesis Testings

    To describe the process of hypothesis testing we feel that we cannot do better than

    following the five-step method introduced by Neave(1976a as appeared Kanji (1999)):

    Step 1

    Formulate the practical problems in terms of hypotheses. A focus should go

    into creating an alternative hypothesis,Ha, since this is more important from

    practical point of view. This should express the range of situations that we wish

    the test to be able to diagnose.In thissense a positive test can indicate that we

    should take action of some kind. Once this is fixed, it should be obvious whether

    we carry out a one- or two-tailed test. The null hypothesis ,H0, needs to be to be

    very simple and represents the status quo, i.e., there is no difference between

    the processes being tested. It is basically a standard or control with which the

    evidence pointing to the alternative can be compared.

    Step 2

    Calculate a statistic (T), a function purely of the data All good test statistics

    should have two properties : (a) they should tend to behave differently when H0 is

    true from when Ha is true; and (b) their probability distribution should be calculable

    under the assumption the H0 is true. It is also desirable that tables of this probability

    distribution should exist.

    Step 3

    Choose a critical region. One should decide on the kind of values of T which will

    most strongly point to Ha being true rather than H0 being true. A value for T lying

    in a suitably defined critical region will lead us to reject H0 in favour of H1; if T

    lies outside the critical region we do not reject H0. We should never conclude by

    accepting H0.

    Step 4

  • 7/30/2019 4 Empirical

    18/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 18

    Decide the size of the critical region. This involves specifying how great a risk we

    are prepared to run of coming to an incorrect conclusion.

  • 7/30/2019 4 Empirical

    19/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 19

    Chi-Square Test

    What is it?

    The data that have been collected need to be processed and analyzed. If the data is of

    type non-quantitative which is not numerical but some criteria such as sex and having a

    headache, then chi-square test can be used for the process. Is there a connection

    between these two criteria ?, we may ask in the research. In statistics, this is called

    measures of associations. The research is looking into the associations between two

    variables which are not numbers in nature.

    What is it for ?

    Measures of associations between two variables

    Level of distribution in the population

    How to use it?

    The test can be used if a table of frequency can be produced. Using SPSS, the following

    steps will derive some results.

    SPSS will pop up a dialogue window

    Select the appropriate variables Click on Statistics to choose the appropriate testchi-square.

    Ok

    Analyze Summarize Crosstabs

  • 7/30/2019 4 Empirical

    20/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 20

    The results may be as follows :

    Value df Asymp.

    Sig.

    (2-sided)

    Pearson

    Chi-square

    43.617a 4 .000

    Likelihood Ratio 46.826 4 .000

    Linear-by-Linear

    Association

    41.263 1 .000

    N of valid Cases 250

    a0 cells (.0%) have expected count less tan 5. The minimum expected count is 12.10.

    Column 2 gives the value of the test

    Column 3 states the degree of freedom (df)

    Column 4 gives the indication that the results is significant or not.

    Take these values and compare the Pearson Chi-square value against the table with (df

    = 4) and (level of significant = 0.05) . If this value is less than the one in the table, then

    null hypothesis (H0) will be rejected. Therefore, alternative hypothesis will be true.

  • 7/30/2019 4 Empirical

    21/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 21

    Confidence Interval

    When the measurements have been collected from the experiments or tests , there is a

    set of numerical values that need to be analysed. Some relevant questions are ;

    1. how certain can we be of the values ?

    2. If there is a set of two values; how certain can we be that the two set of values

    are different ?

    The need to answer these questions will lead us to some more of statistical analysis and

    its vital role in computer science research. Lets say we have the mean sample as equal

    to 0.248. So what? What does it mean? Is it good or bad? There must be a way to justify

    this value and put some kind of evaluation criteria to the number. How confident that the

    value is a true mean ? In other words , it is to come up with a confidence interval that will

    specify the value as follows : if a sample of 40 was drawn and the mean calculated, 95%

    of the time, the mean would lie in between a lowerbound(lb) and an upperbound(ub) in

    such a way that lb < Mk < ub. The bootstrap method is suitable to facilitate this process

    and it is done in the following steps :

    Choose 1000 random sample (with replacement) of size 40 from our original 40

    points.

    Take the mean of each sample.

    Sort and take the value at the 25th and 975th positions.

    The lower bound is 0.2451 and the upper bound is 0.2505. Since

    0.2451 < 0.248 < 0.2505

    The calculated mean can be accepted as the true mean of the said population above.

  • 7/30/2019 4 Empirical

    22/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 22

    The Bootstrap Method

    The bootstrap method is an attractive procedure for CS researchers as it offers a perfect

    getaway from the usual complicated and annoying statistical procedures. This suits the

    CS environment nicely due to flexibility in the sampling criteria. The basic samples of

    data used to find the confidence intervals have distributions which depart from the

    traditional parametric distribution. The constraints in producing enough data for any

    statistical procedures are very common in CS research; and many have resorted to stay

    away from using any statistical analysis. The bootstrap method gives an opportunity to

    produce statistically reliable analysis regardless of the form of the data probability

    density.; it makes no assumption about the different data distributions. Probably the

    main definitive point regarding the bootstrap method is that the entire sampling

    distribution is estimated by relying on the fact that the samples distribution is a perfect

    estimate of the population distribution. On the other hand, the traditional parametric

    inference depends on the assumption that the sample and the population are normally

    distributed.

    The bootstrap method was initially proposed by Efron in 1979. He used Monte Carlo

    sampling to generate an empirical estimate of the sampling distribution. Monte Carlo

    sampling builds an estimate of the sampling distribution by randomly drawing a large

    number of samples of size k from a population, and calculating for each one the

    associated value of the statistics. The relative frequency distribution of these values is

    an estimate of the sampling distribution for that statistic.

    The procedure

    The generic bootstrap method has the following basic ideas as presented by (Efron and

    Tibshirani (1994)) ;

    A bootstrap sample is a sample composed by ) x , , x , (x x n 2 1 * * * *= _ that is

    obtained in a random form with repositioning from the experimental sample ) x , , x , (x x

    n 2 1 _ = , also designated bootstrap population. Here, the asterisk denotes that * x is a

    randomized version, or resampling of x, rather than a new group of actual data. The

    bootstrap sampling consists of corresponding members of x. For each bootstrap

  • 7/30/2019 4 Empirical

    23/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 23

    procedure one should carry out a random resampling by sampling with replacement

    using the n elements from the experimental sample, which will be employed as parent

    population. Thus, the arithmetic mean * i x is reached using the follow equation 1. After a

    number m of resamplings, the arithmetic bootstrap mean * m x is obtained by equation

    2, with standard deviation given from the equation 3.

    The bootstrap distribution of probability is a result of the sequence bellow: In practice,

    the bootstrap distribution is built form the Monte-Carlo Method with a number of

    repetitions, for a sufficiently large m. In this case, the bootstrap mean approximates the

    mean of population and the distribution tends to a normal one (Manly, 1997). The

    convergence is guaranteed by the great numbers law, because, ) x , , x , (x n 2 1 * * * _

    are nothing more than a sample of independent random variables and are identically

    distributed.

    An implementation of the bootstrap method in C++ is as follows ;

    #include

    get_data(); //put the variable of interest in the first n elements of the array X[].

    randomize(); //initializes random number generator

    for (i=0; i

  • 7/30/2019 4 Empirical

    24/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    NORDIN ABU BAKAR 2011 24

    Conclusion

    Statistical analysis can enhance the findings that have been produced in the research.

    The important part is to make sure that the results are well understood to assist in the

    evaluation of the whole research project. This is instrumental to CS researchers so that

    the system or application being produced as the outcome of this research is free from

    experimental flaws or software bugs. A strong justification of the parameters used,

    methods chosen or techniques implemented can safeguard the development stage ,that

    might follow after the research period, from any errors.

    The bootstrap method discussed in this chapter is a brave diversion from the traditional

    parametric inference that has improved analysis in many CS research. The method

    works well in certain circumstances but behaves badly in others. It is good for normal

    distribution but tend to be problematic for skewed distributions; so the use of such

    method must be adopted in great care and clear understanding of the data.

  • 7/30/2019 4 Empirical

    25/25

    RESEARCH METHODS IN COMPUTER SCIENCE

    References

    Bradley Efron (1979). "Bootstrap methods: Another look at the jackknife", The

    Annals of Statistics, 7, 1-26. Bradley Efron (1981). "Nonparametric estimates of standard error: The jackknife,

    the bootstrap and other methods",Biometrika, 68, 589-599.

    Bradley Efron (1982). The jackknife, the bootstrap, and other resampling plans,

    In Society of Industrial and Applied Mathematics CBMS-NSF Monographs, 38.

    P. Diaconis, Bradley Efron (1983), "Computer-intensive methods in statistics,"

    Scientific American, May, 116-130.

    Bradley Efron, Robert J. Tibshirani, (1993).An introduction to the bootstrap, New

    York: Chapman & Hall, software. Davison, A. C. and Hinkley, D. V. (1997): Bootstrap Methods and their

    Applications, software.

    Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to

    Statistical Inference. Sage University Paper series on Quantitative Applications in

    the Social Sciences, 07-095. Newbury Park, CA: Sage.

    Simon, J. L. (1997): Resampling: The New Statistics.

    Good, P.I. Resampling Methods : A Practical Guide To Data Analysis, ISBN :

    978-0-8176, Springer, 2005.

    http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://www.resample.com/content/text/index.shtmlhttp://www.resample.com/content/text/index.shtmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efron