Introduction to Inferential Statistics

27
08/15/22 Dr Tarek Amin 1 Introduction to Inferential Statistics Dr. Tarek Tawfik

Transcript of Introduction to Inferential Statistics

Page 1: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin1

Introduction to Inferential Statistics

Dr. Tarek Tawfik

Page 2: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin2

Sampling DistributionWhat are the steps needed to summarize information?

Collection of measurements

Data reduction (descriptive)

All possible cases included A sub-set (sample)) census(

Population Parameters Sample Statistics

This is a dead end Inferential statistics

Page 3: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin3

Why sampling instead of whole population?

1- Samples are usually cheaper and quicker.2- It is sometimes impossible to locate all members of a population (not included in the list, unavailable, difficult to reach, unwilling to participate).3- Research sometimes destroys the units of analysis so that a census would destroy the population (batteries quality control check).4- Sometimes sampling is more accurate than census (inexperienced data collection team during census).

Page 4: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin4

Problem with Sampling

Do sample statistics are representative to the population as those concluded if we conduct a complete and accurate census?

This problem can be addressed using inferential statistics.

Inferential Statistics: are the numerical techniques for making conclusions about a population based on the information obtained from a random sample drawn from that population.

Page 5: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin5

Steps to Conduct Inferential Statistics

Raw Data of a random sample

Sample Statistics (descriptive Statistics) Measures of Central tendency, dispersion,

association

Inferential StatisticsAllow us to generalize from the sample

statistics to the population.

Page 6: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin6

The Sampling Distribution of a Sample Statistics Concept

≈ Let’s assume that we want to survey a community of 400, the age of them were recorded and having the following parameters:

µ = 35 years σ = 13 years≈ Let’s assume, however, that we do not survey

all 400, instead we randomly select 120 people and ask them about their ages and calculate the mean age.

≈ Then, we put them back into the community and randomly select another 120 residents (may include members of the first sample).

≈ We did this over and over and each time we calculate the mean age. ≈ The results will be like those in the following table.

Page 7: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin7

Distribution of 20 random sample means (n=20)

Sample NumberSample mean

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

34.7

35.9

35.5

34.7

34.5

34.4

35.7

34.6

37.4

35.3

34.1

35.5

34.9

36.2

35.6

35.0

35.1

36.4

35.6

33.6

353433 36 37

. . ..…..…

.… .. ..

All the results are clustered around the population value (35 years), with a few

scores a bit further out and one extremescore of 37.4 years (random variation).

Those 400 people in the community have age range from 2 to 69 years, while the means of

the samples have a very narrow range of valuesof about 4 years and 10 samples coincide with

the population mean (35 years).

Page 8: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin8

A sampling distribution: is the theoretical probability distribution of an infinite number of sample outcomes for a statistics, using random samples of equal size.

Most of the samples will cluster around the population parameters with occasional sample result falling relatively further to one side or the other of the distribution (this called the sampling distribution of sample means).

Has the following properties:

The mean of the sampling distribution is equal to the population mean, the average of the averages (µχ) will be the same as the population mean. The standard error will be related to the standard deviation of the population. The sampling distribution will be normally distributed.

Page 9: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin9

PopulationParameters

Mean S.D

Sample Mean

S.D

Standard error of the mean

The degree the sample statistics are deviating /different from the population parameters.

Page 10: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin10

Central Limit TheoremIf you draw a sample from a population and calculate its mean, how close

have you come to knowing the mean of the population?

How your mean expresses and close enough to the population mean?

When many samples are drawn from a population, the means of these samples tend to be distributed normally and the mean of the means is

very close to the actual mean of the population .

├The Z-score can be calculated for these means┤To calculate we need to know the S.D of the means, (σ/√n) this new S.D of the means is called the standard error of the mean.

The term error indicates the fact that due to sampling error, each sample mean is likely to deviate some what from true population mean.

Page 11: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin11

Central Limit TheoremThe formula for SEM= Standard

deviation/ Square root of n.The formula indicates that we are estimating the SE given the S.D of a sample of size n.For a sample of 100 and S.D of 40 the SEM = 40 /√100 = 4.For a sample of 1000 and S.D of 40 the SEM = 40 / √1000 = 1.26.

Two factors influence the SEM, sample size and S.D of the sample: Sample size has greater impact as it is used a denominator .

For a sample of 100 and S.D of 20 the SEM = 20 / √100 = 2.For a sample of 100 and S.D of 40 the SEM = 40 / √100 = 4.If there is more variability within a sample the greater

the SEM.

Page 12: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin12

Standard Deviation V. Standard Error

The value of σ measures the standard deviation in the population and is based on measurements of individuals “tell us how much variability can

be expected among individuals.”

While SEM (the standard deviation of the means) tell us how much variability can be found among means of the samples.

SEM pertains to means not to individuals.

Page 13: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin13

Estimation and Confidence Intervals

Introduction and conceptDr Tarek Tawfik

Page 14: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin14

Important Concepts

The properties of the sampling distribution:

1 -The mean of the sampling distribution is equal to the population mean.

In other words, although individual results will vary from sample to sample, on average the sample means will be equal to that of the population = Unbiased Estimator

Page 15: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin15

Properties

2 -The spread of sample results around the population value is affected by the sample size .

The standard deviation of the sampling distribution, called the standard error.

As the sample size increases the SEM will decrease (the sample results are more tightly clustered around the population value)= Efficient Estimator of the population values.

n

Page 16: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin16

Properties

3 -The sampling distribution of sample means is normal (Z-score is applicable). Around 95 % of repeated samples will have a mean of 1.96 Standard errors of the population mean.

This allow us to specify a range or INTERVAL of scores within which 95 % of all possible sample means will fall, defined by:

nInterval 96.1

95 % of all possible means will fall in the interval

Page 17: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin17

Estimation

In research we usually have a single sample and we need to estimate the population value from such sample.

In such cases the population parameters are unknown, and we assume that the sample actually fall within a certain region of the sampling distribution

Our sample is one of the 95 % of all possible samples that fall within ±1.96 standard error from the population mean (Confidence Level), so our estimate of the population value is valid.

Page 18: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin18

PopulationValue

Lower limit

Sample

Upper limit

Page 19: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin19

Areas under the standard normal curve.ZArea under curve between

both points (Around the mean)Beyond both points

)two tails(

Beyond one point

)one tail(

±0.1

±0.2

±0.3

±0.4

±0.5

±0.6

±0.7

±0.8

±0.9

±1

±1.1

±1.2

±1.3

±1.4

±1.5

±1.6

±1.645

±1.7

±1.8

±1.9

1.96±2

±2.1

±2.2

±2.3

±2.4

±3

0.080

0.159

0.236

0.311

0.383

0.451

0.516

0.576

0.632

0.683

0.729

0.770

0.806

0.838

0.866

0.890

0.900

0.911

0.928

0.943

0.9500.954

0.964

0.972

0.979

0.984

0.996

0.920

0.841

0.764

0.689

0.617

0.549

0.484

0.424

0.368

0.317

0.271

0.230

0.194

0.162

0.134

0.110

0.100

0.089

0.072

0.057

0.050

0.046

0.036

0.028

0.021

0.010

0.004

0.4600

0.4205

0.3820

0.3445

0.3085

0.2745

0.2420

0.2120

0.1840

0.1585

0.1355

0.1150

0.0970

0.0810

0.0670

0.0550

0.0500

0.0445

0.0360

0.0290

0.0250

0.0230

0.0180

0.0140

0.0105

0.0100

0.0020

Page 20: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin20

Calculation of The Confidence Level1 -Calculation of the standard error:

We do not have the S.D of the population, so we use sample S.D.

2 -Mean – 1.96 S.E = the furthest the population parameter can be below the sample and value remains within the 95 % region = The lower limit.

3 -Mean + 1.96 S.E = the furthest the population parameter can be above the sample and the value remains within the 95 % region = The upper limit.

n

DSs

z

nsCi

.

96.1

Mean +1.96S.E -1.96 S.E

Lower limit Upper Limit

Page 21: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin21

ExampleA random sample of 81 newborn infants from

a hospital in a poor neighborhood during the last year had a mean birth weight of 100

oz, with a SD of 27 oz.

What is the point estimate for the unknown true value of the average (mean) birth weight of all infants born in that hospital in the last year (the population parameter)?

Page 22: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin22

The difference between the sample mean of 100 oz and the unknown population mean (which is estimate) is the sampling error.

Because the point estimate, 100 oz, is a single number, it gives no indication of its sampling error. CIs computed from random sample enable us to measure sampling error in numeric terms.

Page 23: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin23

What is the value of the 95 % CI estimate from mean birth weight?

1 -Calculate the SEM (SD/√n):

27√ / 81=27 /9=3

2 -Calculate the 95 % CI:

Mean ± 1.96 (SEM).

100 ± 1.96)3(

100 ± 5.88

)94.12 to 105.88 oz.(

A confidence interval is the range of values, that, it is estimated,

includes a population statistic, at a specific level of confidence.

Population mean is

102 oz

Page 24: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin24

Interpretation of CI

By using the 95 % confidence level, we are prepared to be wrong only five times in every 100 samples (1 in 20).

The risk of not including the population parameter in our interval estimate is only 5.%

This probability of error is known as alpha level (ά) which is simply one minus the confidence level expressed as proportion,

1 -0.95 = 0.05.

Page 25: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin25

Changing the Confidence LevelA random sample of 200 nurses is taken and each nurse asked his or her

annual income in whole dollars .These 200 nurses have an average income of $ 35,000, SD of $ 5000.

The 95 % CI = 35,000± 1.96 (5000/ √200) = 35,000 ± 695.) 34,305 to 35,695 = ( $ this is the estimate of the average income of all nurses

with a 95 % confidence. The interval width = 35,695-34,305 = $ 1390.

The 99 % CI = 35,000 ± 2.58 (5000 / √200) = 35,000 ± 915 ) 34,085 to 35,915 = (the interval width is 1830 .

The 90 % CI will be : (34,415 to 35,585), the interval width = 1170.$

Page 26: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin26

The smaller the confidence level the larger the alpha

error

)not including the true population parameter and wrong conclusions.(

Page 27: Introduction to Inferential Statistics

04/11/23Dr Tarek Amin27

Thank You