Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...
Transcript of Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...
10/13/2019
1
SBE 304: Bio-Statistics
Random Sampling and Sampling Distributions
Dr. Ayman Eldeib
Systems & Biomedical Engineering Department
Fall 2019
SBE 304: RS & SD
Sampling (Statistics)
Sampling is that part of statistical practice concerned
with the selection of a subset of individual observations
within a population of individuals intended to yield some
knowledge about the population of concern, especially
for the purposes of making predictions based on
statistical inference. Sampling is an important aspect of
data collection.
10/13/2019
2
SBE 304: RS & SD
Random Sampling
Simple Random Sampling
Random sampling is a sampling technique where we select a group of
subjects (a sample) for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the
population has a known, but possibly non-equal, chance of being
included in the sample.
Simple random sampling is the basic sampling technique where we
select a group of subjects (a sample) for study from a larger group (a
population). Each individual is chosen entirely by chance and each
member of the population has an equal chance of being included in the
sample; i.e. each member of the population is equally likely to be chosen
at any stage in the sampling process.
Cont’dSampling (Statistics)
SBE 304: RS & SD
Simple Random Sampling
Simple random sampling refers to a sampling method that has the following properties:
* The population consists of N objects. * The sample consists of n objects. * All possible samples of n objects are equally likely to occur.
The main benefit of simple random sampling is that it guarantees that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.
Cont’dSampling (Statistics)
10/13/2019
3
SBE 304: RS & SD
To understand sampling, you need to first understand a few basic
definitions:
The total set of observations that can be made is called the population.
A sample is a subset of a population
A parameter is a measurable characteristic of a population, such as a
mean or standard deviation.
A statistic is a measurable characteristic of a sample, such as a
mean or standard deviation; i.e., a statistic is any function of the
observations in a random sample.
A sampling method is a procedure for selecting sample elements from
a population.
Cont’dSampling (Statistics)
SBE 304: RS & SD
Measure of Central Tendency
The Mode
The mode is the value that occurs the most frequently in the population or sample.
Suppose we draw a sample of five women and measure their weights. They weigh 100, 100, 130, 140, and 150. Since more women weigh 100 than any other weight, the mode would equal . 100 pounds
The mode is not necessarily unique, since the same maximum frequency may be attained at different values. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.
10/13/2019
4
SBE 304: RS & SD
The Median
To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in the sample of five women, the median value would be 130 pounds; since 130 pounds is the middle weight.
Cont’dMeasure of Central Tendency
SBE 304: RS & SD
The Mean
The mean of a sample or a population is computed by adding all of the observations and
dividing by the number of observations. Returning to the example of the five women, the mean
weight would equal (100 + 100 + 130 + 140 + 150)/5
= 620/5 = 124 pounds.
Cont’dMeasure of Central Tendency
10/13/2019
5
SBE 304: RS & SD
Proportions and Percentages
When the focus is on the degree to which a population possesses a
particular attribute, the measure of interest is a percentage or a
proportion.
A proportion refers to the fraction of the total that possesses a certain
attribute. For example, we might ask what proportion of women in our
sample weigh less than 135 pounds. Since 3 women weigh less than
135 pounds, the proportion would be 3/5 or 0.60.
A percentage is another way of expressing a proportion. A percentage
is equal to the proportion times 100. In our example of the five women,
the percent of the total who weigh less than 135 pounds would be 100 *
(3/5) or 60 percent.
Cont’dMeasure of Central Tendency
SBE 304: RS & SD
Notation
Of the various measures, the mean and the proportion are most important. The notation used to describe these measures appears below:
Note that capital letters refer to population parameters, and lower-case letters refer to sample statistics.
x: Refers to a sample mean. n: Number of observations in the samplep: The proportion of elements in the sample that has a particular attribute. q: The proportion of elements in the sample that does not have a specified attribute. Note that q = 1 - p.
X: µ: Refers to a population mean. N: Number of observations in the populationP: The proportion of elements in the population that has a particular attribute. Q: The proportion of elements in the population that does not have a specified attribute. Note that Q = 1 - P.
- -
Cont’dMeasure of Central Tendency
10/13/2019
6
SBE 304: RS & SD
Measure of Variability
Some parameters attempt to describe the amount of variation between random variables. For example, consider a population of four random variables {5, 5 ,5, 5}. Here, each of the random variables are equal, so there is no variation. The set {3, 5, 5, 7}, on the other hand, has some variation since some random variables are different.
SBE 304: RS & SD
The Range
The range is the simplest measure of variation. It is difference between the biggest and smallest random variable.
Range = r = Maximum value - Minimum value
Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 – 3 = 4.
Cont’d
Measure of Variability
10/13/2019
7
SBE 304: RS & SD
It is important to distinguish between the variance of a population and
the variance of a sample. They have different notation, and they are
computed differently.
σ2: The variance of the population. σ: The standard deviation of the population.
s2: The variance of the sample. s: The standard deviation of the sample.
Variance
Cont’d
Measure of Variability
SBE 304: RS & SD
Variance
You may have noticed that the denominator in the calculation of sample
variance, unlike the denominator in the calculation of population variance,
is n and not (n-1) . Why?
This is because to calculate the sample variance, deviations with
respect to the sample mean are used. Sample observations, tend to be
closer to the sample mean than to µ.
Thus, the calculated deviations are smaller.
As a result, the sample variance obtained is smaller than the
population variance. To compensate for this, (n -1) is used as the
denominator in place of n in the calculation of sample variance.
Cont’d
Measure of Variability
10/13/2019
8
SBE 304: RS & SD
Variance : Example 1
A population consists of four observations: {1, 3, 5, 7}. What is the variance?
Solution: First, we need to compute the population mean.
μ = ( 1 + 3 + 5 + 7 ) / 4 = 4, Then, σ2 = Σ ( Xi - μ )2 / N = [ 9 + 1 + 1 + 9 ] / 4 = 5
Variance : Example 2
A sample consists of four observations: {1, 3, 5, 7}. What is the variance?
Solution: This problem is handled exactly like the previous problem, except that we use the formula for calculating sample variance, rather than the formula for calculating population variance.
s2 = Σ ( xi - x )2 / ( n - 1 ) = [ 9 + 1 + 1 + 9 ] / 3 = 20 / 3 = 6.667
Cont’d
Measure of Variability
SBE 304: RS & SD
A more efficient computational formula for the sample variance
is obtained as follows:
Variance
=
Cont’d
Measure of Variability
10/13/2019
9
SBE 304: RS & SD
Sampling Distributions
For any given population of size N it is possible to get
different samples of size n. Each sample may well have a
different mean. In fact it is possible to get an entire
distribution of different sample means from the various
possible samples. The list of all possible values for a
statistic and the probability associated with each value is
known as a sampling distribution.
SBE 304: RS & SD
Theorem
When the population we are
sampling from has, itself, roughly
the shape of a normal curve, the
sampling distribution of the mean
can be approximated closely with
a normal distribution regardless
of the size of n.
Sampling Distributions
10/13/2019
10
SBE 304: RS & SD
The Central Limit Theorem
For large samples the sampling distribution of
the sample mean can be approximated closely with a
normal distribution with mean equal to the population
mean and standard error equal to where
is the standard deviation of the population.
)30( ≥n
Yµ
µY
σn
σ
σThe sampling distribution of the mean will be approximately
normally distributed no matter what the population distribution looks like.
SBE 304: RS & SD
Keep in mind that this theorem applies only to the mean and not other statistics.
The standard
error of the mean
is defined as the
standard
deviation of the
sampling
distribution of
the mean.
The Central Limit Theorem
10/13/2019
11
SBE 304: RS & SD
Based on this theorem, if is the mean of random
sample of size n from a population with mean and
standard deviation . If n is large, then
has approximately the standard normal distribution
(using the conversion formula).
Y
µ
σ
n
YZ
σ
µ−=
The Central Limit Theorem
SBE 304: RS & SD
In this example, the population distribution shows five levels of a
characteristic/variable that were measured in the population. Each level
of the variable had the same probability of occurring. The population
distribution is presented below.
Example I
The scores in the population range
from 1 to 5. The mean of the
population is 3.0 and the standard
deviation of the population is 1.41
[μ=3.0; σ =1.41].
The Central Limit Theorem
10/13/2019
12
SBE 304: RS & SD
Frequency distributions of
sample means quickly
approach the shape of a
normal distribution, even if
we are taking relatively few,
small samples from a
population that is not
normally distributed (such as
the uniform distribution).
Cont’dThe Central Limit Theorem
Example I
SBE 304: RS & SD
Let's randomly select an
infinite number of samples
of the same size from a
population that follows a
Poisson distribution and
calculate the mean of
scores in each sample.
The sampling distributions
are presented as shown on
the right side.
The Central Limit Theorem
Example II
10/13/2019
13
SBE 304: RS & SD
Let's randomly select an
infinite number of samples
of the same size from a
population that follows a
Normal distribution and
calculate the mean of
scores in each sample.
The sampling distributions
are presented as shown on
the right side.
The Central Limit Theorem
Example III
SBE 304: RS & SD
A synthetic fiber used in manufacturing carpet has tensile strength
that is normally distributed with mean 75.5 psi and standard
deviation 3.5 psi. Find the probability that a random sample of n =
49 fiber specimens will have sample mean tensile strength that
exceeds 75.75 psi.
Yµ
Yσ
n
σ= μ = 75.5 psi = = 0.5
P( ≥ 75.75) = P( Z ≥ 0.5)Y
n
YZ
σ
µ−=
The Central Limit Theorem
Example IV