Biostatistics

Dr. Priya narayanPost graduate studentDepartment of oral pathology & microbiologyRajarajeswari dental college & hospital.

Biostatistics

Contents:-

Introduction

Measures of central tendency

Measures of dispersion

The normal curve

Tests of significance

References

Introduction

‘Statistics’ – Italian word ‘statista’ meaning ‘statesman’ or the

German word ‘statistik’ which means ‘a political state’.

Originated from 2 main sources:

Government records

Mathematics

Registration of heads of families in ancient Egypt & Roman census

on military strength, births & deaths, etc.

John Graunt (1620-1674) – father of health statistics

STATISTICS : is the science of compiling, classifying and

tabulating numerical data & expressing the results in a

mathematical or graphical form.

BIOSTATISTICS : is that branch of statistics concerned with

mathematical facts & data related to biological events.

Uses of statistics

To assess the state of oral health in the community and to

determine the availability and utilization of dental care facilities.

To indicate the basic factors underlying the state of oral health by

diagnosing the community and solutions .

To determine success or failure of specific oral health care

programs or to evaluate the program action.

To promote health legislation and in creating administrative

standards.

MEASURES OF CENTRAL TENDENCY

A single estimate of a series of data that summarizes the data -

the measure of central tendency.

Objective:

To condense the entire mass of data.

To facilitate comparison.

PROPERTIES

Should be easy to understand and compute.

Should be based on each and every item in the series.

Should not be affected by extreme observations.

Should be capable of further statistical computations.

Should have sampling stability.

The most common measures of central tendeny that are used

in dental sciences are :

Arithmetic mean – mathematical estimate.

Median – positional estimate.

Mode – based on frequency.

Arithmetic mean Simplest measure of central tendency.

Ungrouped data: =

Grouped data:

=

Mean Sum of all the observations in the data

Number of observations in the data

Mean Sum of all the variables multiplied by the

corresponding frequency in the data

Total frequency

MEDIAN:-

Middle value in a distribution such that one half of the units in

the distribution have a value smaller than or equal to the median

and one half have a value greater than or equal to the median.

All the observations are arranged in the order of the magnitude.

Middle value is selected as the median.

Odd number of observations : (n+1)/2.

Even number of observations: mean of the middle two values is

taken as the mean.

MODE

The mode or the modal value is that value in a series of

observations that occurs with the greatest frequency.

When mode is ill defined, it can be calculated using the relation

Mode = 3 median – 2 mean

Most commonly used: arithmetic mean.

Extreme values in the series : median.

To know the value that has high influence in the series: mode.

Measures of dispersion

Dispersion is the degree of spread or variation of the variable

about a central value.

Measures of dispersion used:

To determine the reliability of an average.

To serve as basis for control of variability.

To compare two or more series in relation to their variability.

Facilitate further statistical analysis.

RANGE

It is the simplest method, Defined as the difference between the

value of the smallest item and the value of the largest item.

This measure gives no information about the values that lie

between the extremes values.

Subject to fluctuations from sample to sample.

MEAN DEVIATION

It is the average of the deviations from the arithmetic mean.

M.D = ƩX – Xi , where Ʃ ( sigma ) is the sum of, X is the

n arithmetic mean, Xi is the value of each

observation in the data, n is the number of observation in the

data.

STANDARD DEVIATION(SD)

Most important and widely used.

Also known as root mean square deviation, because it is the

square root of the mean of the squared deviations from the

arithmetic mean.

Greater the standard deviation, greater will be the magnitude of

dispersion from the mean.

A small SD means a higher degree of uniformity of the

observations.

CALCULATION

For ungrouped data: Calculate the mean(X) of the series. Take the deviations (d) of the items from the mean by : d=Xi – X,

where Xi is the value of each observation. Square the deviations (d2) and obtain the total (∑ d2) Divide the ∑ d2 by the total number of observations i.e., (n-1) and

obtain the square root. This gives the standard deviation. Symbolically, standard deviation is given by:

SD= √ ∑ d2 /(n-1)

For grouped data with single units for class intervals:

S = √∑(Xi - X) x fi / (N -1)

Where,

Xi is the individual observation in the class interval

fi is the corresponding frequency

X is the mean

N is the total of all frequencies

• For grouped data with a range for the class interval:

S =√ ∑(Xi - X) x fi / (N -1)

Where,

Xi is the midpoint of the class interval

fi is the corresponding frequency

X is the mean

N is the total of all frequencies

COEFFICIENT OF VARIATION(C.V.)

A relative measure of dispersion.

To compare two or more series of data with either different units

of measurement or marked difference in mean.

C.V.= (Sx100)/ X

Where, C.V. is the coefficient of variation

S is the standard deviation

X is the mean

Higher the C.V. greater is the variation in the series of data

NORMAL DISTRIBUTION CURVE

Gaussian curve

Half of the observations lie above and half below the mean

– Normal or Gaussian distribution

Properties

Bell shaped.

Symmetrical about the midpoint.

Total area of the curve is 1. Its mean zero & standard deviation 1.

Height of curve is maximum at the mean and all three measures of

central tendency coincide.

Maximum number of observations is at the value of the variable

corresponding to the mean, numbers of observations gradually

decreases on either side with few observations at extreme points.

Area under the curve between any two points can be found out in

terms of a relationship between the mean and the standard

deviation as follows:

Mean ± 1 SD covers 68.3% of the observations



These limits on either side of mean are called confidence limits.

Forms the basis for various tests of significance .

TESTS OF SIGNIFICANCE

Different samples drawn from the same population, estimates

differ – sampling variability.

To know if the differences between the estimates of different

samples is due to sampling variations or not – tests of

significance.

Null hypothesis

Alternative hypothesis

NULL HYPOTHESIS

There is no real difference in the sample(s) and the

population in the particular matter under consideration

and the difference found is accidental and arises out of

sampling variation.

ALTERNATIVE HYPOTHESIS

Alternative when null hypothesis is rejected.

States that there is a difference between the two groups

being compared.

LEVEL OF SIGNIFICANCE

After setting up a hypothesis, null hypothesis should be either

rejected or accepted.

This is fixed in terms of probability level (p) – called level of

significance.

Small p value - small fluctuations in estimates cannot be

attributed to sampling variations and the null hypothesis is

rejected.

STANDARD ERROR

It is the standard deviation of a statistic like the mean, proportion

etc

Calculated by the relation

Standard error of the population = √(p x q)/ n

Where,

p is the proportion of occurrence of an event in the sample

q is (1-p)

n is the sample size

TESTING A HYPOTHESIS

Based on the evidences gathered from the sample

2 types of error are possible while accepting or rejecting a null

hypothesis

Hypothesis Accepted Rejected

True Right Type I error

False Type II error Right

STEPS IN TESTING A HYPOTHESIS

State an appropriate null hypothesis for the problem.

Calculate the suitable statistics.

Determine the degrees of freedom for the statistic.

Find the p value.

Null hypothesis is rejected if the p value is less than 0.05,

otherwise it is accepted.

TYPES OF TESTS :-

PARAMETRIC

i. student’s ‘t’ test.

ii. One way ANOVA.

iii. Two way ANOVA.

iv. Correlation coefficient.

v. Regression analysis.

NON- PARAMETRIC

i. Wilcoxan signed rank test.

ii. Wilcoxan rank sum test.

iii. Kruskal-wallis one way

ANOVA.

iv. Friedman two way ANOVA.

v. Spearman’s rank correlation.

vi. Chi-square test.

CHI- SQUARE(ᵡ2) TEST

It was developed by Karl Pearson.

It is the alternate method of testing the significance of

difference between two proportions.

Data is measured in terms of attributes or qualities.

Advantage : it can also be used when more than two groups

are to be compared.

Calculation of ᵡ2 –statistic :-

ᵡ2 = Ʃ ( O – E )2

EWhere, O = observed frequency and E = expected frequency.

Finding the degree of freedom(d.f) : it depends on the number of columns & rows in the original table.

d.f = (column -1) (row – 1). If the degree of freedom is 1, the ᵡ2 value for a probability of

0.05 is 3.84.

CHI-SQUARE WITH YATE’S CORRECTION

It is required for compensation of discrete data in the chi-square

distribution for tables with only 1 DF.

It reduces the absolute magnitude of each difference (O- E) by half

before squaring.

This reduces chi- square & thus corrects P( i.e., result significance).

Formula used is :

ᵡ2 = Ʃ[ ( O – E ) – ½]2

E

It is required when chi-square is in borderline of significance.

LIMITATIONS :-

It will not give reliable result if the expected frequency in any one cell is less than 5.

In such cases, Yates’ correction is necessary i.e , reduction of the (O-E) by half.

X2 = ∑[(O-E) – 0.5]2

E The test tells the presence or absence of an association between

the two frequencies but does not measure the strength of association.

Does not indicate the cause & effect. It only tells the probability of occurrence of association by chance.

STUDENT ‘T’ TEST :-

When sample size is small. ‘t’ test is used to test the hypothesis. This test was designed by W.S Gosset, whose pen name was

‘student’. It is applied to find the significance of difference between two

proportions as, Unpaired ‘t’ test. Paired ‘t’ test. Criterias : The sample must be randomly selected. The data must be quantitative. The variable is assumed to follow a normal distribution in the

population. Sample should be less than 30.

PAIRED ‘T’ TEST

When each individual gives a pair of observations.

To test for the difference in the pair values.

Test procedure is as follows:

Null hypothesis

Difference in each set of paired observation calculated : d=X1 – X2

Mean of differences, D =∑d/n, where n is the number of pairs.

Standard deviation of differences and standard error of difference

are calculated.

Test statistic ‘t’ is calculated from : t=D/SD/√n

Find the degrees of freedom(d.f.) (n-1)

Compare the calculated ‘t’ value with the table value for

(n-1) d.f. to find the ‘p’ value.

If the calculated ‘t’ value is higher than the ‘t’ value at 5%,

the mean difference is significant and vice-versa.

ANALYSIS OF VARIANCE(ANOVA) TEST :-

When data of three or more groups is being investigated.

It is a method of partionioning variance into parts( between &

within) so as to yield independent estimate of the population

variance.

This is tested with F distribution : the distribution followed by the

ratio of two independent sample estimates of a population

variance.

F = S12/ S2

2 .The shape depends on DF values associated with S12 &

S22 .

One way ANOVA : if subgroups to be compared are

defined by just one factor.

Two way ANOVA : if subgroups are based on two

factors.

Miscellaneous :- Fisher’s exact test :

A test for the presence of an association between categorical

variables.

Used when the numbers involved are too small to permit the use

of a chi- square test.

Friedman’s test :

A non- parametric equivalent of the analysis of variance.

Permits the analysis of an unreplicated randomized block design.

Kruskal wallis test : A non-parametric test. Used to compare the medians of several independent samples. It is the non-parametric equivalent of the one way ANOVA.

Mann- whitney U test : A non-parametric test. Used to compare the medians of two independent samples.

Mc Nemar’s test : A variant of a chi squared test, used when the data is paired.

Tukey’s multiple comparison test :

It’s a test used as sequel to a significant analysis of variance test,

to determine which of several groups are actually significantly

different from one another.

It has built-in protection against an increased risk of a type 1 error.

Type 1 error : being misled by the sample evidence into rejecting

the null hypothesis when it is in fact true.

Type 2 error : being misled by the sample evidence into failing to

reject the null hypothesis when it is in fact false.

REFERENCES

Park K, Park’s text book of preventive and social

medicine, 21st ed, 2011, Bhanot, India; pg- 785-792.

Peter S, essential of preventive and community dentistry,

4th ed; pg- 379- 386.

Mahajan BK, methods in biostatistics. 6th edition.

John j, textbook of preventive and community dentistry,

2nd ed; pg- 263- 68.

Mahajan BK, methods in biostatistics. 6th edition.

Prabhkara GN, biostatistics; 1st edition.

Rao K Visweswara, Biostatistics – A manual of

statistical methods for use in health, nutrition &

anthropology. 2nd edition.2007.

Raveendran R, Gitanjali B, A practical approach to PG

dissertation.2005.

Biostatistics

Education

Transcript of Biostatistics