Introduction biostatistics

35
Introduction to Biostatistics DR. SYED SANOWAR ALI

Transcript of Introduction biostatistics

Page 1: Introduction biostatistics

Introduction to Biostatistics

DR. SYED SANOWAR ALI

Page 2: Introduction biostatistics

CENTRAL TENDENCY

The centre of the distributionOr

The most typical case

Page 3: Introduction biostatistics

Measures of CENTRAL TENDENCYGiven a data set, a measure of theCENTRAL TENDENCY is a value about whichthe observations tend to cluster

In other words In other words a measure of theCENTRAL TENDENCY is a value around whichCENTRAL TENDENCY is a value around whicha data set is centered a data set is centered

Page 4: Introduction biostatistics

Measures of CENTRAL TENDENCYThe three most common measures are• Mean• Median• Mode

Page 5: Introduction biostatistics

Mean: It is the value that is closest to all the other values in a distribution.

Page 6: Introduction biostatistics

Mean = X1 + X2 + -------- Xn or nµ = X1 + X2 + -------- XN or N∑ = summation = X barµ = muN = total number of values in populationn = total number of values in sample

nxx

Nx

Page 7: Introduction biostatistics

Find the mean of the following five salaries 6000, 10000, 14000, 50000, 10000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 14000, 50000• Step 2. Add all of the observed values in the distribution. 6000+10000+10000+14000+50000= 90000• Step 3. Divide the sum by the number of observations. 90000 / 5 = 18000

• Therefore, the mean salary is 18000nxx

Page 8: Introduction biostatistics

Properties of Mean1. One computes the mean by using all

the values of the data.2. The mean is used in computing other statistics, such as variance3. The mean for the data set is unique and not necessarily one of the data value4. The mean is affected by extremely high or low values, called outliers, and may not be the appropriate to use in these situation

Page 9: Introduction biostatistics

Median is the middle value of a set of data that has been put into rank order. The median is also the 50th percentile of the distribution.

Median

Page 10: Introduction biostatistics

Example A: Odd Number of Observations Find the median of the following6000, 10000, 14000, 50000, 10000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 14000, 50000• Step 2. Find the middle position of the distribution by using

(n + 1) / 2. Middle position = (5 + 1) / 2 = 6 / 2 = 3• Therefore, the median will be the value at the third

observation.• Step 3. Identify the value at the middle position. Third observation = 10000

Page 11: Introduction biostatistics

Example A: Even Number of Observations Find the median of the following6000, 10000, 14000, 50000, 10000, 12000• Step 1. Arrange the values in ascending order. 6000, 10000, 10000, 12000, 14000, 50000• Step 2. Find the middle position of the distribution by

using (n + 1) / 2. Middle position = (6 + 1) / 2 = 7 / 2 = 3.5• Step 3. Identify the value at the middle position.The median equals the average of the values of the third(value = 10000) and fourth (value = 12000 observations: Median = (10000 + 12000) / 2 = 11000

Page 12: Introduction biostatistics

Properties of Median1. The median is used when one must

find the center or middle value 2. The median is used when one must determine whether the data values fall into the upper half or lower half of the distribution 3. The median is affected less than mean by extremely high or extremely low values

Page 13: Introduction biostatistics

Mode is the value that occurs most often in a set of data. It can be determined simply by tallying the number of times each value occurs.

Page 14: Introduction biostatistics

ModeIn this case salary 10000 is the value thatoccurs most frequently.The mode is 10000It should be noted that there can be morethan one mode for a data set

Page 15: Introduction biostatistics

Properties of Mode1. The mode is used when the most

typical case is desired2. The mode is the easiest to compute 3. The mode can be used when the data

are nominal such as religious preference, gender, or political affiliation 4. The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set

Page 16: Introduction biostatistics

Find the mean of the following incubation periods for hepatitis A:

27, 31, 15, 30, and 22 days.• Step 1. Arrange the values in ascending order

distribution. 15, 22, 27, 30, 31 Step 2. Add all of the observed values in the distribution. 15 + 22 + 27 + 30 + 31 = 125• Step 3. Divide the sum by the number of observations. 125 / 5 = 25.0• Therefore, the mean incubation period is 25.0 days.

Page 17: Introduction biostatistics

Example B: Even Number of ObservationsSuppose a sixth case of hepatitis was reported. hepatitis A:

27, 31, 15, 30, 22 and 29 days.• Step 1. Arrange the values in ascending order. 15, 22, 27, 29, 30, and 31 days• Step 2. Find the middle position of the distribution by

using (n + 1) / 2. Middle location = 6 + 1 / 2 = 7 / 2 = 3½• Step 3. Identify the value at the middle position.The median equals the average of the values of the third

(value = 27) and fourth (value = 29) observations: Median = (27 + 29) / 2 = 28 days

Page 18: Introduction biostatistics

Example B: Find the mode of the following incubation periods for hepatitis A:

27, 31, 15, 30, and 22 days.• Step 1. Arrange the values in ascending order. 15, 22, 27, 30, and 31 days• Step 2. Identify the value that occurs most often. None• Note: When no value occurs more than once, the

distribution is said to have no mode.

Page 19: Introduction biostatistics

the number of doses of diphtheria-pertussis- tetanus (DPT) vaccine each of seventeen 2-year-old children in a particular village received:0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4Two children received no doses; two children received 1 dose; three received 2 doses; six received 3 doses; and four received all 4 doses.

Therefore, the mode is 3 doses, because more children received 3 doses than any other number of doses.

Page 20: Introduction biostatistics

Which measure of CT should you use ?The Mean is by far the most common measure ofCT. It uses all of the information in the sample.This measure is very good when the distributionis symmetrical.

Page 21: Introduction biostatistics

Mean , Median and ModeData:4000, 4500, 5000, 5500, 6000, 6000, 6500,7000, 7500 and 8000Mean = 6000Median = 6000Mode = 6000

= = Same Same

Page 22: Introduction biostatistics

Salary

Mean , Median and Mode= SameMean , Median and Mode= Same

Normal Distribution Or Curve

Page 23: Introduction biostatistics

Which measure of CT should you use ?If the distribution is skewed or there areextreme values the Mean is artificially pulledtowards the extreme value. Age example: 19, 20, 21, 22, 49 Mean=26.2 Mean=26.2

yrs. yrs. Mean=49.2 Mean=49.2

Marks example 05, 55, 57, 63, 66

Page 24: Introduction biostatistics

Which measure of CT should you use ?Age : 19, 20, 21, 22,

49 Mean=26.2 Mean=26.2 yrs. yrs.

Right skewed or Positively skewed

Page 25: Introduction biostatistics
Page 26: Introduction biostatistics

Which measure of CT should you use ?Marks 05, 55, 57, 63, 66

Mean=49.2 Mean=49.2

Left skewed or Negatively skewed

Page 27: Introduction biostatistics
Page 28: Introduction biostatistics

Which measure of CT should you use ?• If the distribution is skewed or there are extreme

values, in such a case Median proves to be better measure of the CT.

• Median is resistant to extreme observations.

Page 29: Introduction biostatistics

Which measure of CT should you use ?• Mode is commonly used as a measure of

popularity that reflect CT of Opinion • Examples: 1. Most preferred pain killer 2. Most preferred model of washing machine 3. Most popular candidate

Page 30: Introduction biostatistics

Most fighting cricket team • Pakistan=1• Australia=2• India=3• England=4

1, 2, 4, 1, 2, 1, 3, 1, 4, 1,1, 2, 4, 1, 2, 1, 3, 1, 4, 1,2, 1, 3, 2, 4, 4, 1, 1, 1, 4,2, 1, 3, 2, 4, 4, 1, 1, 1, 4,3, 1, 1, 4, 2, 1, 1, 2, 1, 2,3, 1, 1, 4, 2, 1, 1, 2, 1, 2,1, 4, 1, 1, 3, 2, 4, 1, 4, 1 1, 4, 1, 1, 3, 2, 4, 1, 4, 1

Which measure of CT should you use ?Mean(2.075

)

MODE 19884499

Median(2) Mode(1)

Page 31: Introduction biostatistics

Measurement of Variation

Measurement of DispersionOR

Page 32: Introduction biostatistics

RangeThe range is the simplest measure of variation to find. It is simply the highest value minus the lowest value.RANGE = MAXIMUM - MINIMUM Since the range only uses the largest and smallest values, it is greatly affected by extreme values, that is - it is not resistant to change.

Page 33: Introduction biostatistics

Variance (σ2)

The Variance is defined as:The average of the squared differences from the Mean.

σ2 = Σ (Xi - x̄)2 / N-1 (if sample size ≤ 30)

σ2 = Σ (Xi - x̄)2 / N

Page 34: Introduction biostatistics

Standard deviation (σ)

The Standard Deviation is a measure of how spread out numbers are.Its symbol is σ (the greek letter sigma)The formula is easy: it is the square root of the Variance.  σ = √σ2

Page 35: Introduction biostatistics

Coefficient of variance (Cv)

The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each otherCv = Standard Deviation x 100 Mean