Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...

13
10/13/2019 1 SBE 304: Bio-Statistics Random Sampling and Sampling Distributions Dr. Ayman Eldeib Systems & Biomedical Engineering Department Fall 2019 SBE 304: RS & SD Sampling (Statistics) Sampling is that part of statistical practice concerned with the selection of a subset of individual observations within a population of individuals intended to yield some knowledge about the population of concern, especially for the purposes of making predictions based on statistical inference. Sampling is an important aspect of data collection.

Transcript of Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are...

Page 1: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

1

SBE 304: Bio-Statistics

Random Sampling and Sampling Distributions

Dr. Ayman Eldeib

Systems & Biomedical Engineering Department

Fall 2019

SBE 304: RS & SD

Sampling (Statistics)

Sampling is that part of statistical practice concerned

with the selection of a subset of individual observations

within a population of individuals intended to yield some

knowledge about the population of concern, especially

for the purposes of making predictions based on

statistical inference. Sampling is an important aspect of

data collection.

Page 2: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

2

SBE 304: RS & SD

Random Sampling

Simple Random Sampling

Random sampling is a sampling technique where we select a group of

subjects (a sample) for study from a larger group (a population). Each

individual is chosen entirely by chance and each member of the

population has a known, but possibly non-equal, chance of being

included in the sample.

Simple random sampling is the basic sampling technique where we

select a group of subjects (a sample) for study from a larger group (a

population). Each individual is chosen entirely by chance and each

member of the population has an equal chance of being included in the

sample; i.e. each member of the population is equally likely to be chosen

at any stage in the sampling process.

Cont’dSampling (Statistics)

SBE 304: RS & SD

Simple Random Sampling

Simple random sampling refers to a sampling method that has the following properties:

* The population consists of N objects. * The sample consists of n objects. * All possible samples of n objects are equally likely to occur.

The main benefit of simple random sampling is that it guarantees that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.

Cont’dSampling (Statistics)

Page 3: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

3

SBE 304: RS & SD

To understand sampling, you need to first understand a few basic

definitions:

The total set of observations that can be made is called the population.

A sample is a subset of a population

A parameter is a measurable characteristic of a population, such as a

mean or standard deviation.

A statistic is a measurable characteristic of a sample, such as a

mean or standard deviation; i.e., a statistic is any function of the

observations in a random sample.

A sampling method is a procedure for selecting sample elements from

a population.

Cont’dSampling (Statistics)

SBE 304: RS & SD

Measure of Central Tendency

The Mode

The mode is the value that occurs the most frequently in the population or sample.

Suppose we draw a sample of five women and measure their weights. They weigh 100, 100, 130, 140, and 150. Since more women weigh 100 than any other weight, the mode would equal . 100 pounds

The mode is not necessarily unique, since the same maximum frequency may be attained at different values. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.

Page 4: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

4

SBE 304: RS & SD

The Median

To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in the sample of five women, the median value would be 130 pounds; since 130 pounds is the middle weight.

Cont’dMeasure of Central Tendency

SBE 304: RS & SD

The Mean

The mean of a sample or a population is computed by adding all of the observations and

dividing by the number of observations. Returning to the example of the five women, the mean

weight would equal (100 + 100 + 130 + 140 + 150)/5

= 620/5 = 124 pounds.

Cont’dMeasure of Central Tendency

Page 5: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

5

SBE 304: RS & SD

Proportions and Percentages

When the focus is on the degree to which a population possesses a

particular attribute, the measure of interest is a percentage or a

proportion.

A proportion refers to the fraction of the total that possesses a certain

attribute. For example, we might ask what proportion of women in our

sample weigh less than 135 pounds. Since 3 women weigh less than

135 pounds, the proportion would be 3/5 or 0.60.

A percentage is another way of expressing a proportion. A percentage

is equal to the proportion times 100. In our example of the five women,

the percent of the total who weigh less than 135 pounds would be 100 *

(3/5) or 60 percent.

Cont’dMeasure of Central Tendency

SBE 304: RS & SD

Notation

Of the various measures, the mean and the proportion are most important. The notation used to describe these measures appears below:

Note that capital letters refer to population parameters, and lower-case letters refer to sample statistics.

x: Refers to a sample mean. n: Number of observations in the samplep: The proportion of elements in the sample that has a particular attribute. q: The proportion of elements in the sample that does not have a specified attribute. Note that q = 1 - p.

X: µ: Refers to a population mean. N: Number of observations in the populationP: The proportion of elements in the population that has a particular attribute. Q: The proportion of elements in the population that does not have a specified attribute. Note that Q = 1 - P.

- -

Cont’dMeasure of Central Tendency

Page 6: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

6

SBE 304: RS & SD

Measure of Variability

Some parameters attempt to describe the amount of variation between random variables. For example, consider a population of four random variables {5, 5 ,5, 5}. Here, each of the random variables are equal, so there is no variation. The set {3, 5, 5, 7}, on the other hand, has some variation since some random variables are different.

SBE 304: RS & SD

The Range

The range is the simplest measure of variation. It is difference between the biggest and smallest random variable.

Range = r = Maximum value - Minimum value

Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 – 3 = 4.

Cont’d

Measure of Variability

Page 7: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

7

SBE 304: RS & SD

It is important to distinguish between the variance of a population and

the variance of a sample. They have different notation, and they are

computed differently.

σ2: The variance of the population. σ: The standard deviation of the population.

s2: The variance of the sample. s: The standard deviation of the sample.

Variance

Cont’d

Measure of Variability

SBE 304: RS & SD

Variance

You may have noticed that the denominator in the calculation of sample

variance, unlike the denominator in the calculation of population variance,

is n and not (n-1) . Why?

This is because to calculate the sample variance, deviations with

respect to the sample mean are used. Sample observations, tend to be

closer to the sample mean than to µ.

Thus, the calculated deviations are smaller.

As a result, the sample variance obtained is smaller than the

population variance. To compensate for this, (n -1) is used as the

denominator in place of n in the calculation of sample variance.

Cont’d

Measure of Variability

Page 8: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

8

SBE 304: RS & SD

Variance : Example 1

A population consists of four observations: {1, 3, 5, 7}. What is the variance?

Solution: First, we need to compute the population mean.

μ = ( 1 + 3 + 5 + 7 ) / 4 = 4, Then, σ2 = Σ ( Xi - μ )2 / N = [ 9 + 1 + 1 + 9 ] / 4 = 5

Variance : Example 2

A sample consists of four observations: {1, 3, 5, 7}. What is the variance?

Solution: This problem is handled exactly like the previous problem, except that we use the formula for calculating sample variance, rather than the formula for calculating population variance.

s2 = Σ ( xi - x )2 / ( n - 1 ) = [ 9 + 1 + 1 + 9 ] / 3 = 20 / 3 = 6.667

Cont’d

Measure of Variability

SBE 304: RS & SD

A more efficient computational formula for the sample variance

is obtained as follows:

Variance

=

Cont’d

Measure of Variability

Page 9: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

9

SBE 304: RS & SD

Sampling Distributions

For any given population of size N it is possible to get

different samples of size n. Each sample may well have a

different mean. In fact it is possible to get an entire

distribution of different sample means from the various

possible samples. The list of all possible values for a

statistic and the probability associated with each value is

known as a sampling distribution.

SBE 304: RS & SD

Theorem

When the population we are

sampling from has, itself, roughly

the shape of a normal curve, the

sampling distribution of the mean

can be approximated closely with

a normal distribution regardless

of the size of n.

Sampling Distributions

Page 10: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

10

SBE 304: RS & SD

The Central Limit Theorem

For large samples the sampling distribution of

the sample mean can be approximated closely with a

normal distribution with mean equal to the population

mean and standard error equal to where

is the standard deviation of the population.

)30( ≥n

µY

σn

σ

σThe sampling distribution of the mean will be approximately

normally distributed no matter what the population distribution looks like.

SBE 304: RS & SD

Keep in mind that this theorem applies only to the mean and not other statistics.

The standard

error of the mean

is defined as the

standard

deviation of the

sampling

distribution of

the mean.

The Central Limit Theorem

Page 11: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

11

SBE 304: RS & SD

Based on this theorem, if is the mean of random

sample of size n from a population with mean and

standard deviation . If n is large, then

has approximately the standard normal distribution

(using the conversion formula).

Y

µ

σ

n

YZ

σ

µ−=

The Central Limit Theorem

SBE 304: RS & SD

In this example, the population distribution shows five levels of a

characteristic/variable that were measured in the population. Each level

of the variable had the same probability of occurring. The population

distribution is presented below.

Example I

The scores in the population range

from 1 to 5. The mean of the

population is 3.0 and the standard

deviation of the population is 1.41

[μ=3.0; σ =1.41].

The Central Limit Theorem

Page 12: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

12

SBE 304: RS & SD

Frequency distributions of

sample means quickly

approach the shape of a

normal distribution, even if

we are taking relatively few,

small samples from a

population that is not

normally distributed (such as

the uniform distribution).

Cont’dThe Central Limit Theorem

Example I

SBE 304: RS & SD

Let's randomly select an

infinite number of samples

of the same size from a

population that follows a

Poisson distribution and

calculate the mean of

scores in each sample.

The sampling distributions

are presented as shown on

the right side.

The Central Limit Theorem

Example II

Page 13: Random Sampling and Sampling Distributions · 2019-10-13 · The sampling distributions are presented as shown on the right side. The Central Limit Theorem Example II . 10/13/2019

10/13/2019

13

SBE 304: RS & SD

Let's randomly select an

infinite number of samples

of the same size from a

population that follows a

Normal distribution and

calculate the mean of

scores in each sample.

The sampling distributions

are presented as shown on

the right side.

The Central Limit Theorem

Example III

SBE 304: RS & SD

A synthetic fiber used in manufacturing carpet has tensile strength

that is normally distributed with mean 75.5 psi and standard

deviation 3.5 psi. Find the probability that a random sample of n =

49 fiber specimens will have sample mean tensile strength that

exceeds 75.75 psi.

n

σ= μ = 75.5 psi = = 0.5

P( ≥ 75.75) = P( Z ≥ 0.5)Y

n

YZ

σ

µ−=

The Central Limit Theorem

Example IV