Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...

10/13/2019

1

SBE 304: Bio-Statistics

Random Sampling and Sampling Distributions

Dr. Ayman Eldeib

Systems & Biomedical Engineering Department

Fall 2019

SBE 304: RS & SD

Sampling (Statistics)

Sampling is that part of statistical practice concerned

with the selection of a subset of individual observations

within a population of individuals intended to yield some

knowledge about the population of concern, especially

for the purposes of making predictions based on

statistical inference. Sampling is an important aspect of

data collection.

10/13/2019

2

SBE 304: RS & SD

Random Sampling

Simple Random Sampling

Random sampling is a sampling technique where we select a group of

subjects (a sample) for study from a larger group (a population). Each

individual is chosen entirely by chance and each member of the

population has a known, but possibly non-equal, chance of being

included in the sample.

Simple random sampling is the basic sampling technique where we

select a group of subjects (a sample) for study from a larger group (a

population). Each individual is chosen entirely by chance and each

member of the population has an equal chance of being included in the

sample; i.e. each member of the population is equally likely to be chosen

at any stage in the sampling process.

Cont’dSampling (Statistics)

SBE 304: RS & SD

Simple Random Sampling

Simple random sampling refers to a sampling method that has the following properties:

* The population consists of N objects. * The sample consists of n objects. * All possible samples of n objects are equally likely to occur.

The main benefit of simple random sampling is that it guarantees that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.


10/13/2019

3

SBE 304: RS & SD

To understand sampling, you need to first understand a few basic

definitions:

The total set of observations that can be made is called the population.

A sample is a subset of a population

A parameter is a measurable characteristic of a population, such as a

mean or standard deviation.

A statistic is a measurable characteristic of a sample, such as a

mean or standard deviation; i.e., a statistic is any function of the

observations in a random sample.

A sampling method is a procedure for selecting sample elements from

a population.


SBE 304: RS & SD

Measure of Central Tendency

The Mode

The mode is the value that occurs the most frequently in the population or sample.

Suppose we draw a sample of five women and measure their weights. They weigh 100, 100, 130, 140, and 150. Since more women weigh 100 than any other weight, the mode would equal . 100 pounds

The mode is not necessarily unique, since the same maximum frequency may be attained at different values. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.

10/13/2019

4

SBE 304: RS & SD

The Median

To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in the sample of five women, the median value would be 130 pounds; since 130 pounds is the middle weight.

Cont’dMeasure of Central Tendency

SBE 304: RS & SD

The Mean

The mean of a sample or a population is computed by adding all of the observations and

dividing by the number of observations. Returning to the example of the five women, the mean

weight would equal (100 + 100 + 130 + 140 + 150)/5

= 620/5 = 124 pounds.


10/13/2019

5

SBE 304: RS & SD

Proportions and Percentages

When the focus is on the degree to which a population possesses a

particular attribute, the measure of interest is a percentage or a

proportion.

A proportion refers to the fraction of the total that possesses a certain

attribute. For example, we might ask what proportion of women in our

sample weigh less than 135 pounds. Since 3 women weigh less than

135 pounds, the proportion would be 3/5 or 0.60.

A percentage is another way of expressing a proportion. A percentage

is equal to the proportion times 100. In our example of the five women,

the percent of the total who weigh less than 135 pounds would be 100 *

(3/5) or 60 percent.


SBE 304: RS & SD

Notation

Of the various measures, the mean and the proportion are most important. The notation used to describe these measures appears below:

Note that capital letters refer to population parameters, and lower-case letters refer to sample statistics.

x: Refers to a sample mean. n: Number of observations in the samplep: The proportion of elements in the sample that has a particular attribute. q: The proportion of elements in the sample that does not have a specified attribute. Note that q = 1 - p.

X: µ: Refers to a population mean. N: Number of observations in the populationP: The proportion of elements in the population that has a particular attribute. Q: The proportion of elements in the population that does not have a specified attribute. Note that Q = 1 - P.

- -


10/13/2019

6

SBE 304: RS & SD

Measure of Variability

Some parameters attempt to describe the amount of variation between random variables. For example, consider a population of four random variables {5, 5 ,5, 5}. Here, each of the random variables are equal, so there is no variation. The set {3, 5, 5, 7}, on the other hand, has some variation since some random variables are different.

SBE 304: RS & SD

The Range

The range is the simplest measure of variation. It is difference between the biggest and smallest random variable.

Range = r = Maximum value - Minimum value

Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 – 3 = 4.

Cont’d


10/13/2019

7

SBE 304: RS & SD

It is important to distinguish between the variance of a population and

the variance of a sample. They have different notation, and they are

computed differently.

σ2: The variance of the population. σ: The standard deviation of the population.

s2: The variance of the sample. s: The standard deviation of the sample.

Variance

Cont’d


SBE 304: RS & SD

Variance

You may have noticed that the denominator in the calculation of sample

variance, unlike the denominator in the calculation of population variance,

is n and not (n-1) . Why?

This is because to calculate the sample variance, deviations with

respect to the sample mean are used. Sample observations, tend to be

closer to the sample mean than to µ.

Thus, the calculated deviations are smaller.

As a result, the sample variance obtained is smaller than the

population variance. To compensate for this, (n -1) is used as the

denominator in place of n in the calculation of sample variance.

Cont’d


10/13/2019

8

SBE 304: RS & SD

Variance : Example 1

A population consists of four observations: {1, 3, 5, 7}. What is the variance?

Solution: First, we need to compute the population mean.

μ = ( 1 + 3 + 5 + 7 ) / 4 = 4, Then, σ2 = Σ ( Xi - μ )2 / N = [ 9 + 1 + 1 + 9 ] / 4 = 5

Variance : Example 2

A sample consists of four observations: {1, 3, 5, 7}. What is the variance?

Solution: This problem is handled exactly like the previous problem, except that we use the formula for calculating sample variance, rather than the formula for calculating population variance.

s2 = Σ ( xi - x )2 / ( n - 1 ) = [ 9 + 1 + 1 + 9 ] / 3 = 20 / 3 = 6.667

Cont’d


SBE 304: RS & SD

A more efficient computational formula for the sample variance

is obtained as follows:

Variance

=

Cont’d


10/13/2019

9

SBE 304: RS & SD

Sampling Distributions

For any given population of size N it is possible to get

different samples of size n. Each sample may well have a

different mean. In fact it is possible to get an entire

distribution of different sample means from the various

possible samples. The list of all possible values for a

statistic and the probability associated with each value is

known as a sampling distribution.

SBE 304: RS & SD

Theorem

When the population we are

sampling from has, itself, roughly

the shape of a normal curve, the

sampling distribution of the mean

can be approximated closely with

a normal distribution regardless

of the size of n.

Sampling Distributions

10/13/2019

10

SBE 304: RS & SD

The Central Limit Theorem

For large samples the sampling distribution of

the sample mean can be approximated closely with a

normal distribution with mean equal to the population

mean and standard error equal to where

is the standard deviation of the population.

)30( ≥n

Yµ

µY

σn

σ

σThe sampling distribution of the mean will be approximately

normally distributed no matter what the population distribution looks like.

SBE 304: RS & SD

Keep in mind that this theorem applies only to the mean and not other statistics.

The standard

error of the mean

is defined as the

standard

deviation of the

sampling

distribution of

the mean.


10/13/2019

11

SBE 304: RS & SD

Based on this theorem, if is the mean of random

sample of size n from a population with mean and

standard deviation . If n is large, then

has approximately the standard normal distribution

(using the conversion formula).

Y

µ

σ

n

YZ

σ

µ−=


SBE 304: RS & SD

In this example, the population distribution shows five levels of a

characteristic/variable that were measured in the population. Each level

of the variable had the same probability of occurring. The population

distribution is presented below.

Example I

The scores in the population range

from 1 to 5. The mean of the

population is 3.0 and the standard

deviation of the population is 1.41

[μ=3.0; σ =1.41].


10/13/2019

12

SBE 304: RS & SD

Frequency distributions of

sample means quickly

approach the shape of a

normal distribution, even if

we are taking relatively few,

small samples from a

population that is not

normally distributed (such as

the uniform distribution).

Cont’dThe Central Limit Theorem

Example I

SBE 304: RS & SD

Let's randomly select an

infinite number of samples

of the same size from a

population that follows a

Poisson distribution and

calculate the mean of

scores in each sample.

The sampling distributions

are presented as shown on

the right side.


Example II

10/13/2019

13

SBE 304: RS & SD

Let's randomly select an

infinite number of samples

of the same size from a

population that follows a

Normal distribution and

calculate the mean of

scores in each sample.

The sampling distributions

are presented as shown on

the right side.


Example III

SBE 304: RS & SD

A synthetic fiber used in manufacturing carpet has tensile strength

that is normally distributed with mean 75.5 psi and standard

deviation 3.5 psi. Find the probability that a random sample of n =

49 fiber specimens will have sample mean tensile strength that

exceeds 75.75 psi.

Yµ

Yσ

n

σ= μ = 75.5 psi = = 0.5

P( ≥ 75.75) = P( Z ≥ 0.5)Y

n

YZ

σ

µ−=


Example IV

Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...

Documents

Transcript of Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...