4 Empirical
-
Upload
mohd-faizin -
Category
Documents
-
view
219 -
download
0
Transcript of 4 Empirical
-
7/30/2019 4 Empirical
1/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 1
Research Design : An Empirical Approach
Introduction
Computer science ,like other scientific disciplines, needs specific methods and tests to
justify the results that have been produced in the research. There are a lot of research in
CS that leave this statistical part out of their discussion and put the demonstration of the
prototype as the ultimate justification of the findings. In order for the research to be
widely accepted, CS researchers must accompany their results with some justifications
to prove that the numbers are valid and correct. There are a number of approaches that
can be used to carry out research in CS; empirical approach is one of them. Empirical
approach uses statistics as one of the ways to analyse the findings or test hypotheses.
Analysing the data using statistics would require researchers to understand some basic
notions in statistics as this chapter is about to explore.
There are several reasons why statistical analysis is relevant in computer science;
The analysis explains the results in a common platform that everyone will
understand..
It is an explanation of the situation being studied. This understanding gives a
clearer picture that may have not been understood before.
Measures whether the research has been successfully executed.
Answers the research questions.
Evaluate the topic currently being investigated whether there is more to it or it is
a dead end and researcher should find another way to understand the situation.
The fact that statistical analysis is so hard to work on is just another myth and if properly
handled and patiently studied will benefit the researcher in the long run. The knowledge
gained after each analysis process will remain with the researcher to tackle the next
problem. It grows and after each research , a researcher will feel more confident and try
to get the most out of the process.
-
7/30/2019 4 Empirical
2/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 2
Level of Measurement
Level of measurement
Nominal scale
Interval scale
Ratio scale
Ordinal scale
All tests
Chi sq test
Non-parametrictest
t-test/F-test
-
7/30/2019 4 Empirical
3/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 3
Nominal Scale
The word nominal means to name. This is used in statistics to be utilized in the
dataset so as they can properly represent the data. Numbers are assigned to variablesonly to classify or categorize them; such as : 1= male, 2=female. In this manner the
data set can be easily manipulated and aid in data analysis. The only arithmetic that is
relevant for such group of data is counting. The statistical usage is very limited and
normally good for keeping track of people, objects & events. The common statistical test
that can be performed for this kind of data set is a chi-square test.
Ordinary Scale
Numbers or values are assigned to the objects or events to establish rank or order;
such as, 1st , 2nd, 3rd or 4th positions. Intervals of the scale are not equal, i.e., adjacent
ranks need not be equal in their differences. For data to be in this manner, there are no
more precise comparisons possible. The median is an appropriate measure of central
tendency. A percentile or quartile is used for measuring dispersion. Rank order
correlations are possible. The statistical tests that are possible for this data are non-
parametric tests. This is commonly used in qualitative research.
Ordinal level question
In which category was your income last year?
1. above RM100K
2. RM50KRM100K
3. Below RM50K
.
-
7/30/2019 4 Empirical
4/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 4
Interval Scale
The numbers or values assigned to the objects or events which can be categorized,
ordered and assumed to have equal distance between scale values. The usual
representation would be the test score or degrees of temperature : such as a fahrenheittemperature scale (72) degrees or test score (0 100). There is no absolute zero or
unique origin; only an arbitrary zero can be had and hence no capacity to measure the
complete absence of a trait or characteristic This type of data is more powerful than
ordinal scale due to the concept of equality of interval. The sample mean is an
appropriate measure of central tendency. Standard deviation(SD) is widely used for
dispersion. The common relevant statistical tests for this data are t-test & F-test for
significance.
Ratio Scale
The numbers in this data set represent the objects or events which can be categorized,
ordered and assumed to have equal distance between scale values and have a real zero
point. The values can be used for all statistical tests that conform with the requirements
for the particular test. The highest level of measurement ; All mathematical operations
and statistical techniques can be applied ;all manipulations that are possible with real
numbers can be carried out.
Ratio level question :
What was your income last year ? .
-
7/30/2019 4 Empirical
5/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 5
Figure 1 : Steps in The Planning Phase
Population
Population refers to the entire group of the subject to be studied. The size of the
population is very important to be determined due to the need of inference later on. The
-
7/30/2019 4 Empirical
6/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 6
validity of the results as a reference to the whole population depends onhow much data
are collected to infer the entire population.
Sampling
Sampling is a process to draw some elements from the population and analyse it. Since
sampling unit is the subgroup of the object under study, it must reflect the whole group
as uch as possible. Failing to do so will jeopardize the data, the results and the findings.
In sampling process, we need to define sampling frame and sampling methods. The
sampling frame represents the elements in a population where a sample is drawn. It
could be membership names, staff directories, registered students, zakat recipients,
licensed traders, and the likes. With clearly defined group, researcher can determine
who should be included and who should be excluded; thus, would minimize the amount
of error in the data. Once the sampling frame is determined, a researcher can select an
appropriate sampling method.
How to determine the sample size
The sample size must represent the entire population of the subjects being studied. In
order to avoid any error due to misrepresentation, the sample size for a simple random
must specify :
The level of confidence
The acceptable amount of errors
The values of the SD or proportion
Research Design
Research methodologies are commonly characterized by the research designs. Each
research design will specify the method used in the experiment to collect the data.
Research design is defined as a plan for conducting research which usually includes
specification of the elements to be examined and the methods used.
-
7/30/2019 4 Empirical
7/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 7
The research design will be selected to serve the purpose as the most suitable and
feasible methods for hypothesis testing or answering research questions. The diagram in
Figure 2 shows the different research designs.
Figure 2 : Research Designs
True Experimental Design
In experimental design, a specification for a research study is laid out to answer specific
research question such as - Does variable A cause variable B to increase in value?
The plan must include
Methods of selecting and assigning subjects
Number and types of variables.
-
7/30/2019 4 Empirical
8/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 8
The main purpose of the experiment is to apply some controls and study the cause and
effects of the variables on each other. The variables known as independent variables
might be assumed to cause the ones known as dependent variables. Thus, the
experiment will show with exact values whether or not the assumption is correct and
acceptable. Control mechanisms in experimental design play an important role towards
the validity as well as the reliability of the results. In order to make sure that the results
are valid, the researcher must be able to :
Have a good amount of data form at least two comparison groups.
Apply random selection
Manipulate the independent variable to apply different treatments.
The threats to validity in experimental design are :
1. The events which occur between the 1st and 2nd measurements.
2. The changes in the subject during the course of the experiment
3. Subjects might change their opinion after the first measurement. The second
measurement might be different due to this knowledge. Researcher or RA might
change the way they execute measurement.
Factorial Design
This research design is one of the true experimental whenever the research has more
than one independent variables.
Solomon 4 group Design
Pretest-Posttest Design
Posttest Only Design
-
7/30/2019 4 Empirical
9/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 9
Figure 3: Steps in The Action Phase
-
7/30/2019 4 Empirical
10/25
-
7/30/2019 4 Empirical
11/25
-
7/30/2019 4 Empirical
12/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 12
K Mean Mean_K
45,30,67,89,55,66,75,49,50,85 Total/Count 611/10 =61.1
The mean value represents the total picture of the group or list of numbers. It takes into
account all values in the group and this gives positive justification on why the mean can
represent the group. However, the mean can work well if all the values are well-
distributed; otherwise if there exists one odd value (called outliers) the mean could be
well off. The process to derive the mean also could be a constraint if the data is large.
The median is the middle number in a list. When there a list of nu mber and the
numbers are positioned in order; median will be the middle number if the count of the
numbers in the list is odd or the value in between two middle numbers if the count is
odd. For example ;
K Median Median_K
45,30,67,89,55,66,75,49,50,85 (Mid1+Mid2)/2 (55 + 66)/2 = 60.5
Compared to the mean, the median can resolve that problem whenever there is an odd
value in the data; because it does not affect the derivation of the median. The process is
also simple and easy to manipulate is it does not include any numerical computation. But
whenever there is no numerical computation, the value taken as the median is not
precise does not tell much about the data.
The mode is the number occurs most frequently in the list. For example ;
K Mode Mode_K
2,2,3,4,5,4,5,4,4,4 Most number occured 4
This mode process is quite simple and straight forward. Due to its simplicity, the mode
is very raw and could be questionable especially when there is not much difference in
the frequencies of the data.
-
7/30/2019 4 Empirical
13/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 13
Standard Deviation
Standard Deviation (SD) is the measure of dispersion from the mean. After a value for
mean has been determined, other data can be calculated for their distance from the
mean and this is called the standard deviation(SD). If the SD is large, this indicates that
the data are widely distributed with many values are far away from the mean. On the
other hand, if the SD is small, most values are very close to the mean. If the SD is zero
then there is no dispersion and all values are the same. The SD value will position the
data above or below the mean; and this can be used to evaluate that particular value
against the rest of the values in the data list.
Correlation
When the objective of the research is to find relationships between variable then
correlation analysis is inevitable. In computer science, this research element is also
present and very common. For example, the study might want to find the relationship
between parameter A and parameter B in an application.; the software errors and
testing procedures or programmers attitude or relationship between machine
architecture and speed of execution etc. If one variable scores as highly as the other
variable; then the relationship is referred to as positive correlation. On the other hand if
one variable scores highly but the other one scores low there is a negative correlation.
There are also some cases when relationships are scattered around and do not present
a cohesive trend; the relationship is deemed zero correlation.
In order to determine the type of correlation for the data set , a scatter graph will do the
job but for more precise numerical value a statistical test must be performed. Pearsons
and Spearmans rho are two example ofstatistical tests that can facilitate the correlation
analysis. The value produced from the tests is referred to as correlation co-efficient
which could be in the range of +1 and -1. The co-efficient that close to +1 has a strong
positive correlation, the one that close to -1 has a strong negative correlation and the
closer it is to 0 has a weak correlation.
Lets say after the test, the co-efficient between X and Y is 0.8(r = 0.8) and the co-
efficient between X and Z is 0.4 (r=0.4). The interpretation of correlation co-efficient is
-
7/30/2019 4 Empirical
14/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 14
quite tricky and not as straight forward as it may seem. A correlation between X and Y is
in fact 64% (100*r^2); meaning that 64% of the time whenever X changes, Y changes
as well. A correlation between X and Z then is 16% (100* r^2); meaning that 16% of the
time whenever X changes, Z will change as well. However , it is important to note that
correlation analysis does not tell us whether X causes Y to change or not; it just gives an
indication of correlated change exists. If one wants to know whether X causes Y or not ,
then a causal research should be employed where specific experiments are carried out
in the lab to find out whether X causes Y to change or not. This will lead us to another
interesting test in statistical analysis called hypothesis testing.
Correlation Analysis with SPSS
What is it?Very often researchers intend to visualize a situation when things are different. The
needs to explain the circumstances in a more rational manner will take us to find a
reasonable way to analyze and make a conclusion. Do students who are good in math
achieve higher CGPA and do those who are not good in math get lower CGPA? What
we are trying to understand is whether the variable good in math correlates with the
variable CGPA. If this was the case then we would say that there is a positive correlation
between the variables.
Positive correlation means as a score on one variable increases so thecorresponding score on the other variable does the same (in SPSS, the value will
be positive).
As in the above context, if the student is good in math his CPGA will be higher as
well.
There is a situation when we find a score on one variable goes up ,the score on the
other corresponding variable goes down. This is referred to as negative correlation (in
SPSS, the value in the table will be negative). One example of such correlation isbetween weight of a person and the health. As the weight goes up the less healthy that
person tends to be.
-
7/30/2019 4 Empirical
15/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 15
What is it for?
Measures the strength and direction of linear relationship between a pair of variables. If
we have more than two variables, then we need a multivariate analysis.
How to use it?
Using SPSS, the steps to follow are as follows :
Correspondence dialogue box will appear
Select the variables to correlate and move them to the variables box.
Choose the appropriate correlations coefficients
o Interval data _> Pearson
o Ordinal data _> Spearman
Select one-tailed or two-tailed test
o One-tailed -> direction of the relation is known
o Two-tailed -> direction is unknown
Ok
Sample output is as follows :
Variable A Variable B Variable C
Pearson
Correlations
Variable A
Variable B
Variable C
1.000
0.690*
0.840*
0.690*
1.000
0.750*
0.840*
0.750*
1.000
Sig.(2-tailed)
Variable A
Variable B
Variable C
.
0.002
0.005
0.002
.
0.000
0.005
0.000
.
Statistics Correlate Bivariate
-
7/30/2019 4 Empirical
16/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 16
N
Variable A
Variable B
Variable C
20
20
20
20
20
20
20
20
20
*correlation is significant at the 0.01 level(2-tailed)
Testing the hypothesis
Hypothesis is an assumption that one can make for an impact that happened on certain
variables. For example,
1. Parameter A is better than parameter B.
2. Selection sort performs faster than Binary sort
3. Cost affects software performance
Using statistical techniques the outcome of this process whether the hypothesis is
accepted or rejected can be properly justified and supported. A few steps to exercise
hypothesis testing is as follows :
1. Choose a null hypothesis. Make an opposite assumption on a vital variable of
your study. If the study wants to prove A, choose B (opposite of A)as a null
hypothesis. This is a trick when B is rejected, A will be statistically true.
2. Choose an alternative hypothesis that can be accepted in case the original
hypothesis (in 1) is rejected. This is A (as in 1) and in case B is rejected, this can
be accepted even though there is no evident that it is true.
3. Make a condition when to reject the hypothesis and when not to reject it.
4. Draw a sample in random and select a statistical method.
5. Based on the test, choose to reject or not to reject null hypothesis.
6. Pick alternative hypothesis if null hypothesis is rejected.
-
7/30/2019 4 Empirical
17/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 17
Hypothesis Testings
To describe the process of hypothesis testing we feel that we cannot do better than
following the five-step method introduced by Neave(1976a as appeared Kanji (1999)):
Step 1
Formulate the practical problems in terms of hypotheses. A focus should go
into creating an alternative hypothesis,Ha, since this is more important from
practical point of view. This should express the range of situations that we wish
the test to be able to diagnose.In thissense a positive test can indicate that we
should take action of some kind. Once this is fixed, it should be obvious whether
we carry out a one- or two-tailed test. The null hypothesis ,H0, needs to be to be
very simple and represents the status quo, i.e., there is no difference between
the processes being tested. It is basically a standard or control with which the
evidence pointing to the alternative can be compared.
Step 2
Calculate a statistic (T), a function purely of the data All good test statistics
should have two properties : (a) they should tend to behave differently when H0 is
true from when Ha is true; and (b) their probability distribution should be calculable
under the assumption the H0 is true. It is also desirable that tables of this probability
distribution should exist.
Step 3
Choose a critical region. One should decide on the kind of values of T which will
most strongly point to Ha being true rather than H0 being true. A value for T lying
in a suitably defined critical region will lead us to reject H0 in favour of H1; if T
lies outside the critical region we do not reject H0. We should never conclude by
accepting H0.
Step 4
-
7/30/2019 4 Empirical
18/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 18
Decide the size of the critical region. This involves specifying how great a risk we
are prepared to run of coming to an incorrect conclusion.
-
7/30/2019 4 Empirical
19/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 19
Chi-Square Test
What is it?
The data that have been collected need to be processed and analyzed. If the data is of
type non-quantitative which is not numerical but some criteria such as sex and having a
headache, then chi-square test can be used for the process. Is there a connection
between these two criteria ?, we may ask in the research. In statistics, this is called
measures of associations. The research is looking into the associations between two
variables which are not numbers in nature.
What is it for ?
Measures of associations between two variables
Level of distribution in the population
How to use it?
The test can be used if a table of frequency can be produced. Using SPSS, the following
steps will derive some results.
SPSS will pop up a dialogue window
Select the appropriate variables Click on Statistics to choose the appropriate testchi-square.
Ok
Analyze Summarize Crosstabs
-
7/30/2019 4 Empirical
20/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 20
The results may be as follows :
Value df Asymp.
Sig.
(2-sided)
Pearson
Chi-square
43.617a 4 .000
Likelihood Ratio 46.826 4 .000
Linear-by-Linear
Association
41.263 1 .000
N of valid Cases 250
a0 cells (.0%) have expected count less tan 5. The minimum expected count is 12.10.
Column 2 gives the value of the test
Column 3 states the degree of freedom (df)
Column 4 gives the indication that the results is significant or not.
Take these values and compare the Pearson Chi-square value against the table with (df
= 4) and (level of significant = 0.05) . If this value is less than the one in the table, then
null hypothesis (H0) will be rejected. Therefore, alternative hypothesis will be true.
-
7/30/2019 4 Empirical
21/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 21
Confidence Interval
When the measurements have been collected from the experiments or tests , there is a
set of numerical values that need to be analysed. Some relevant questions are ;
1. how certain can we be of the values ?
2. If there is a set of two values; how certain can we be that the two set of values
are different ?
The need to answer these questions will lead us to some more of statistical analysis and
its vital role in computer science research. Lets say we have the mean sample as equal
to 0.248. So what? What does it mean? Is it good or bad? There must be a way to justify
this value and put some kind of evaluation criteria to the number. How confident that the
value is a true mean ? In other words , it is to come up with a confidence interval that will
specify the value as follows : if a sample of 40 was drawn and the mean calculated, 95%
of the time, the mean would lie in between a lowerbound(lb) and an upperbound(ub) in
such a way that lb < Mk < ub. The bootstrap method is suitable to facilitate this process
and it is done in the following steps :
Choose 1000 random sample (with replacement) of size 40 from our original 40
points.
Take the mean of each sample.
Sort and take the value at the 25th and 975th positions.
The lower bound is 0.2451 and the upper bound is 0.2505. Since
0.2451 < 0.248 < 0.2505
The calculated mean can be accepted as the true mean of the said population above.
-
7/30/2019 4 Empirical
22/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 22
The Bootstrap Method
The bootstrap method is an attractive procedure for CS researchers as it offers a perfect
getaway from the usual complicated and annoying statistical procedures. This suits the
CS environment nicely due to flexibility in the sampling criteria. The basic samples of
data used to find the confidence intervals have distributions which depart from the
traditional parametric distribution. The constraints in producing enough data for any
statistical procedures are very common in CS research; and many have resorted to stay
away from using any statistical analysis. The bootstrap method gives an opportunity to
produce statistically reliable analysis regardless of the form of the data probability
density.; it makes no assumption about the different data distributions. Probably the
main definitive point regarding the bootstrap method is that the entire sampling
distribution is estimated by relying on the fact that the samples distribution is a perfect
estimate of the population distribution. On the other hand, the traditional parametric
inference depends on the assumption that the sample and the population are normally
distributed.
The bootstrap method was initially proposed by Efron in 1979. He used Monte Carlo
sampling to generate an empirical estimate of the sampling distribution. Monte Carlo
sampling builds an estimate of the sampling distribution by randomly drawing a large
number of samples of size k from a population, and calculating for each one the
associated value of the statistics. The relative frequency distribution of these values is
an estimate of the sampling distribution for that statistic.
The procedure
The generic bootstrap method has the following basic ideas as presented by (Efron and
Tibshirani (1994)) ;
A bootstrap sample is a sample composed by ) x , , x , (x x n 2 1 * * * *= _ that is
obtained in a random form with repositioning from the experimental sample ) x , , x , (x x
n 2 1 _ = , also designated bootstrap population. Here, the asterisk denotes that * x is a
randomized version, or resampling of x, rather than a new group of actual data. The
bootstrap sampling consists of corresponding members of x. For each bootstrap
-
7/30/2019 4 Empirical
23/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 23
procedure one should carry out a random resampling by sampling with replacement
using the n elements from the experimental sample, which will be employed as parent
population. Thus, the arithmetic mean * i x is reached using the follow equation 1. After a
number m of resamplings, the arithmetic bootstrap mean * m x is obtained by equation
2, with standard deviation given from the equation 3.
The bootstrap distribution of probability is a result of the sequence bellow: In practice,
the bootstrap distribution is built form the Monte-Carlo Method with a number of
repetitions, for a sufficiently large m. In this case, the bootstrap mean approximates the
mean of population and the distribution tends to a normal one (Manly, 1997). The
convergence is guaranteed by the great numbers law, because, ) x , , x , (x n 2 1 * * * _
are nothing more than a sample of independent random variables and are identically
distributed.
An implementation of the bootstrap method in C++ is as follows ;
#include
get_data(); //put the variable of interest in the first n elements of the array X[].
randomize(); //initializes random number generator
for (i=0; i
-
7/30/2019 4 Empirical
24/25
RESEARCH METHODS IN COMPUTER SCIENCE
NORDIN ABU BAKAR 2011 24
Conclusion
Statistical analysis can enhance the findings that have been produced in the research.
The important part is to make sure that the results are well understood to assist in the
evaluation of the whole research project. This is instrumental to CS researchers so that
the system or application being produced as the outcome of this research is free from
experimental flaws or software bugs. A strong justification of the parameters used,
methods chosen or techniques implemented can safeguard the development stage ,that
might follow after the research period, from any errors.
The bootstrap method discussed in this chapter is a brave diversion from the traditional
parametric inference that has improved analysis in many CS research. The method
works well in certain circumstances but behaves badly in others. It is good for normal
distribution but tend to be problematic for skewed distributions; so the use of such
method must be adopted in great care and clear understanding of the data.
-
7/30/2019 4 Empirical
25/25
RESEARCH METHODS IN COMPUTER SCIENCE
References
Bradley Efron (1979). "Bootstrap methods: Another look at the jackknife", The
Annals of Statistics, 7, 1-26. Bradley Efron (1981). "Nonparametric estimates of standard error: The jackknife,
the bootstrap and other methods",Biometrika, 68, 589-599.
Bradley Efron (1982). The jackknife, the bootstrap, and other resampling plans,
In Society of Industrial and Applied Mathematics CBMS-NSF Monographs, 38.
P. Diaconis, Bradley Efron (1983), "Computer-intensive methods in statistics,"
Scientific American, May, 116-130.
Bradley Efron, Robert J. Tibshirani, (1993).An introduction to the bootstrap, New
York: Chapman & Hall, software. Davison, A. C. and Hinkley, D. V. (1997): Bootstrap Methods and their
Applications, software.
Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to
Statistical Inference. Sage University Paper series on Quantitative Applications in
the Social Sciences, 07-095. Newbury Park, CA: Sage.
Simon, J. L. (1997): Resampling: The New Statistics.
Good, P.I. Resampling Methods : A Practical Guide To Data Analysis, ISBN :
978-0-8176, Springer, 2005.
http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://www.resample.com/content/text/index.shtmlhttp://www.resample.com/content/text/index.shtmlhttp://en.wikipedia.org/wiki/SAGE_Publicationshttp://statwww.epfl.ch/davison/BMA/library.htmlhttp://lib.stat.cmu.edu/S/bootstrap.funshttp://en.wikipedia.org/wiki/Chapman_%26_Hallhttp://en.wikipedia.org/w/index.php?title=Robert_J._Tibshirani&action=edit&redlink=1http://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Scientific_Americanhttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/P._Diaconishttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/Biometrikahttp://en.wikipedia.org/wiki/Bradley_Efronhttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/The_Annals_of_Statisticshttp://en.wikipedia.org/wiki/Bradley_Efron