The standard normal curve & its application in biomedical sciences

Post on 13-Jun-2015

1.270 views 1 download

Tags:

Transcript of The standard normal curve & its application in biomedical sciences

The Standard Normal Curve

and its applications

By : Dr. Abhishek Tiwari

Based on the Normal distribution Probability distribution of a continuous variable Most important probability distribution in statistical

inference NORMAL : statistical properties of a set of data Most biomedical variables follow this Its not a law Truth : many of these characteristics approx. follow it No variable is precisely normally distributed

Introduction

Can be used to model the distribution of variable of interest

Allows us to make useful probability statements Human stature & human intelligence PD powerful tool for summarizing , describing set of

data Conclusion about a population based on sample Relationship between values of a random variable &

probability of their occurrence Expressed as a graph or formulae

Introduction

Abraham de Moivre discovered the normal distribution in 1733

French

Quetelet noticed this in heights of army people.

Belgian

Gaussian distribution, after Carl Friedrich Gauss.

German

Marquis de Laplace proved the central limit theorem in 1810 , French

For large sample size the sampling distribution of the mean follows normal distribution

If sample studied is large enough normal distribution can be assumed for all practical purposes

The Normal Curve

.

The Normal Distribution

X

f(X)

µ

σ

Changing μ shifts the distribution left or right.

Changing σ increases or decreases the spread.

The normal curve is not a single curve but a family of curves, each of which is determined by its mean and standard deviation.

Mean : Measure of Central tendency Center or middle of data set around which

observations are lying Assuming : frequency in each class is uniformly

distributed and representable by mid point Mean for grouped data is given by where n = no of observations fi = frequency of each (ith) class interval xi = mid point of each class interval

Mean (µ)

Standard Deviation : Measure of Dispersion Average deviation of observations around the

mean Compactness or variation of data SD = root mean square deviation SD = variance = (xi – x )² where n = no of observations

= mean of the frequency distribution xi = mid point of each class interval

Standard Deviation σ

Standard Deviation : Measure of Dispersion Average deviation of observations around the

mean Compactness or variation of data SD = root mean square deviation SD = variance = f (xi – x )² where n = no of observations

= mean of the frequency distribution xi = mid point of each class interval

Standard Deviation σ

Properties Of Normal Curve Perfectly symmetrical about its mean µ has a so called ‘ bell-shaped’ form Unimodal & Unskewed The mean of a distribution is the midpoint of

the curve and mean = median = mode Two points of inflection The tails are asymptotic As no of observations n tend towards → ∞ And the Width of class interval → 0

The frequency polygon approaches a smooth curve

Properties Of Normal Curve

The “area under the curve” is measured in standard deviations from the mean

Total area under curve & x axis = 1 sq unit (based on probability)

Transformed to a standard curve for comparison

Proportion of the area under the curve is the relative frequency of the z-score

Mean = 0 and SD = 1 , unit normal distribution

Properties of the normal curve General relationships:±1 SD = about 68.26%

±2 SD = about 95.44%±3 SD = about 99.72%

-5 -4 -3 -2 -1 0 1 2 3 4 5

68.26%

95.44%

99.72%

Consider the distribution of a group of runners :

mean = 127.8

SD = 15.5

68-95-99.7 Rule

68% of the data

95% of the data

99.7% of the data

80 90 100 110 120 130 140 150 160 0

5

10

15

20

25

P e r c e n t

POUNDS

127.8 143.3112.3

68% of 120 = .68x120 = ~ 82 runners

In fact, 79 runners fall within 1± SD (15.5 kg) of the mean.

Weight(kg)

80 90 100 110 120 130 140 150 160 0

5

10

15

20

25

P e r c e n t

POUNDS

127.896.8

95% of 120 = .95 x 120 = ~ 114 runners

In fact, 115 runners fall within 2-SD’s of the mean.

158.8

Weight(kg)

80 90 100 110 120 130 140 150 160 0

5

10

15

20

25

P e r c e n t

POUNDS

127.881.3

99.7% of 120 = .997 x 120 = 119.6 runners

In fact, all 120 runners fall within 3-SD’s of the mean.

174.3

Weight(kg)

Standard Scores are expressed in standard deviation units

To compare variables measured on different scales. There are many kinds of Standard Scores. The most

common is the ‘z’ scores. How much the original score lies above or below the

mean of a normal curve All normal distributions can be converted into the

standard normal curve by subtracting the mean and dividing by the standard deviation

The Standard Normal Distribution (Z)

Z scores

What is a z-score?A z score is a raw score expressed in standard deviation units.

S

XXz

Here is the formula for a z score:

Comparing X and Z units

Z100

2.00200 X ( = 100, = 50)

( = 0, = 1)

What we need is a standardized normal curve which can be used for any normally distributed variable. Such a curve is called the Standard Normal Curve.

Application of Normal Curve Model

Using z scores to compare two raw scores from different distributions

Can determine relative frequency and probability Can determine percentile rank Can determine the proportion of scores between

the mean and a particular score Can determine the number of people within a

particular range of scores by multiplying the proportion by N

Using z scores to compare two raw scores from different distributions

You score 80/100 on a statistics test and your friend also scores 80/100 on their test in another section. Hey congratulations you friend says—we are both doing equally well in statistics. What do you need to know if the two scores are equivalent?

the mean?

What if the mean of both tests was 75?

You also need to know the standard deviation

What would you say about the two test scores if the S in your class was 5 and the S in your friends class is 10?

Calculating z scoresWhat is the z score for your test: raw score = 80; mean = 75, S = 5?

S

XXz

1

5

7580

z

What is the z score of your friend’s test: raw score = 80; mean = 75, S = 10?

S

XXz

5.

10

7580

z

Who do you think did better on their test? Why do you think this?

Area under curve

Procedure: To find areas, first compute Z scores. Substitute score of interest for Xi

Use sample mean for µ and sample standard deviation for S.

The formula changes a “raw” score (Xi) to a standardized score (Z).

S

XXz

Finding Probabilities

If a distribution has: = 13 s = 4

What is the probability of randomly selecting a score of 19 or more?Find the Z score.For Xi = 19, Z = 1.50.

Find area in Z table = 0.9332Probability is 1- 0.9332 = 0.0668 or 0.07

X

Areas under the curve can also be expressed as probabilities

In Class Example

After an exam, you learn that the mean for the class is 60, with a standard deviation of 10. Suppose your exam score is 70.

What is your Z-score? Where, relative to the mean, does your score lie? What is the probability associated with your score

(use Z table)?

To solve:

Available information: Xi = 70

= 60 S = 10

Formula: Z = (Xi – ) / S

= (70 – 60) /10

= +1.0

Your Z-score of +1.0 is exactly 1 s.d. above the mean (an area of 34.13% + 50%) You are at the 84.13

percentile.

-5 -4 -3 -2 -1 0 1 2 3 4 5

< Mean = 60

Area 34.13%> <Area 34.13%

< Z = +1.0

68.26%

Area 50%-------> <-------Area 50%

95.44%

99.72%

What if your score is 72?

Calculate your Z-score.

What percentage of students have a score below your score? Above?

How many students are in between you and mean

What percentile are you at?

Answer: Z = 1.2 , area = 0.8849 (from left side

upto z) The area beyond Z = 1 - 0.8849 =

0.1151(% of marks below = 88.49%)

(11.51% of marks are above yours) Area between mean and Z = 0.8849 -

0.50 = 0.3849 = 38 % Your mark is at the 88th percentile!

What if your mark is 55%?

Calculate your Z-score.

What percentage of students have a score below your score? Above?

What percentile are you at?

Answer:

Z = - 0.5

The area beyond Z = .3085(30.85% of the marks are below yours)

Students above your score 1 – 0.3085 = 0.6915 (% of marks above = 69.15%)

Your mark is only at the 31st percentile!

Another Question…

What if you want to know how much better or worse you did than someone else? Suppose you have 72% and your classmate has 55%?

How much better is your score?

Answer: Z for 72% = 1.2 or area = 0.3849 (0.8849 – 0.5 )

above mean

Z for 55% = -0.5 area 0.1915 below mean (table 0.3085)

1 – 0.3085 = 0.6915 0.6915 – 0.5 = 0.1915 Area between Z = 1.2 and Z = -.5 would be .3849

+ .1915 = .5764

Your mark is 57.64% better than your classmate’s mark with respect to the rest of the class.

Probability:

Let’s say your classmate won’t show you the mark….

How can you make an informed guess about what your neighbour’s mark might be?

What is the probability that your classmate has a mark between 60% (the mean) and 70% (1 s.d. above the mean)?

Answer:

Calculate Z for 70%......Z = 1.0

In looking at Z table, you see that the area between the mean and Z is .3413

There is a .34 probability (or 34% chance) that your classmate has a mark between 60% and 70%.

The probability of your classmate having a

mark between 60 and 70% is .34 :

-5 -4 -3 -2 -1 0 1 2 3 4 5

< Mean = 60

Area 34.13%> <Area 34.13%

< Z = +1.0 (70%)

68.26%

Area 50%-------> <------Area 50%

95.44%

99.72%

Mean cholesterol of a sample : 210 mg %, SD = 20mg% Cholesterol value is normally distributed in a sample of 1000.

Find the no of persons 1) > 210 2) > 260 3) < 250 4) between 210 and 230 .

Z1 = (210-210)/20 =0 area = 0.5 person = 1000*0.5 = 500Z2 = (260-210)/20 = 2.5 , area = 0.9938 1 – 0.9938 = 0.0062 Persons = 1000*0.0062= 6.2Z3 = (250-210)/20 = 2 , area = 0.9773 ,person = 1000 * 0.9773 = 977.2Z4 = (230-210)/20 = 1 , area = 0.3413 , person = 1000*0.3413 = 341.3

Medical problem

References :

1. Biostatistics ,7th edition By Wayne W. Daniel ,Wiley India Pvt. Ltd.

2. Medical Statistics ,By K R Sundaram ,BI Publications.

3. Methods in Biostatistics ,7th edition By B K Mahajan , Jaypee publication

4. Park’s Textbook of PSM , 22nd edition.5. Biostatistics ,2nd edition By K.V.Rao ,Jaypee

publications.6. Principles & practice of Biostatistics , 5th

edition ,by J.V.Dixit , Bhanot publishers.

49

Multiple Transformation of Data

Why z-scores?

Transforming scores in order to make comparisons, especially when using different scales

Gives information about the relative standing of a score in relation to the characteristics of the sample or population Location relative to mean measured in standard

deviations Relative frequency and percentileGives us information about the location of that score relative to the “average” deviation of all scores