Biostatistics Community Medicine Gomal Medical College Notes

7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

1/22

STATISTICS

Latin world, meaning useful to state

Numerical facts systematically arranged A scientific subject that deals with collection, compilation, presentation,

analysis, interpretation and making inferences (conclusions) of data.

BIOSTATISTICS

The applications of statistical methods to biological events

VITAL STATISTICS

Data from vital eventssuch as births, deaths, marriage, divorce, fetal deaths

It is a major source of information about health of population.

USES OF STATISTICS

1. To collect data in best possible way.

2. To describe characteristics of group3. To analyze the data & draw conclusions

SOURCES OF HEALTH STATISTICS

Registration of vital events Notification of diseases Record of hospitals Census

Surveys Surveillance HIMS


2/22

DATA Any collected piece of information

Observations made on individuals These are individual values measured, observed or presented. Recorded values of characteristics of individual of population or sample

These are basic building blocks of statistics.

TYPES OF DATA

PRIMARY DATAData collected for first time to answer specific question of

interest in study.

SECONDARY DATAPreviously gathered data for some other purpose.

COLLECTION OF DATA

There are two approaches of data collectiona. CENSUS:---- complete enumeration of whole field

-------costly and time consumingb. SAMPLING:----- partial enumeration

------saving money and time

METHODS OF COLLECTION OF PRIMARY DATA

1. Observation2. Questionnaire3. Interview4. Case studies5. Documentation survey

METHODS OF COLLECTION OF SECONDARY DATA

1. Official publications2. Journals & newspapers3. Research organization


3/22

VARIABLE

Any factor that varies Any quantity that varies Any collected piece of information that varies A characteristic of the individual of a population or sample which varies

from individual to individual.Examples: age, weight, income

The variable age of person can take different values-----because a person canbe 20 years old,35 years old and so on.

It is a basic unit to perform a research. All medical research is study ofrelationship among variables.It provide yardstick on which the effects of treatments or experiences aremeasured------------------------------ it is characteristic of interest in study.

TYPES OF VARIABLES

------------- according to form of characteristic of interest

NUMERICAL/ QUANTATIVE VARIABLES

Variables whose values are expressed in numbers

Examples: age, weight, number of children, monthly income

CATEGORICAL/ QUALITATIVE VARIABLES

Variables whose values are expressed in categories

Examples:Color: red, blue and greenOutcome of disease: recovery, chronicity and death------ where choice of answers are limited to yes or no


4/22

DEPENDENT VARIABLES

The variable that is used to describe the problem under study

-------------------------------------- also called--------Effect VariableExample:

A study to see relationship between mother education and malnutrition inchildren----------------------------------- malnutrition is a dependent variable INDEPENDENT VARIABLES

The variables that is used to describe the factors that cause or influence theproblem under study

------------------------------------- also called-------------Cause VariableExample:

A study to see relationship between smoking and lung cancer-------------------------------------------------------------- smoking is the independent variable( with values varying from not smoking to smoking more than 3 packets/day)

CONFOUNDING VARIABLE

A variable that is expressed as nuisance effect that distort true relationshipbetween independent variable (exposure) & dependent variable(disease/outcome) \

Also known as-------- intervening or background or contaminated variable.------ it confuses our research---- it projects in research but not real variable

Example:

Mother education------------------------------------ ( independent variable)Malnutrition ------------------------------------ ( dependent variable )Family income ------------------------------------ ( confounding variable ) commonconfounding variables------------------------ age, sex, socio-economic status


5/22

INDICATOR

A variable with characteristics of quality, quantity or time.

----It is operationalizing(defining) the variables--------------- making themmeasurable------------------------------------------- measuring tool of variableExample:variable:--------------------------------------------- household incomeindicators:-----------------------high income ( Rs.5000 and above per month)

middle income ( Rs.2000-4999 per month)low income ( less than Rs. 2000 per month)

HEALTH INDICATOR

An indicator which measure different dimensions or changes in health.Example:Number of deaths due to child bearing & puerperium among total live births in a

year ---------------------------------------------------(Maternal mortality rate)


6/22

Analysis between Demographic and Research variables:-------------------- Tests of Significance/Hypothesis testing

Analysis between Research variables:----------------------------------------- Tests of Correlation and Regression

Assessment of relationship------------ between---------------- two Researchvariables

---------------------------depends upon------purpose of reserach-----------------degree of relationship :------------------------------------- Correlation

prediction (forecasting):---------------------------------------Regression

Correlation & Regression are two statistical techniques used to define therelationship between two different variables when measured on same people instudy.

Correlation ------ A statistical tool that tell us how close relationship betweenvariablesFor example: Age & Weight relationship of boys

-------- to study whether a high value in age corresponds tovalue in weight of boys

Regression------ A statistical method that uses relationship between 2 or morevariables such that the value of one variable can be predicted based on value ofthe other----------------It predicts value of one variable knowing value of another

variableRegression analysis is the methodology used for the purpose of prediction

one variable is considered to be predicted variable------

its value vary according to predictor variablefor example:

predicted variable------------ marks obtained in exampredictor variable------------- time spent on study

predicted variable --------------------- yield of cropspredictor variable --------------------- amount of rainfall


7/22

POPULATION (universe )

A large collection of items that have characteristic in common

The items----------------------------------------- people, animals, plants or things

It is the entire group we are interested in, which we wish to describe ordraw conclusions about.

It is the entire group about which some specific information is required orrecorded

Examples: Students in class, chairs in class, books in library, fishes in a lake

SAMPLEA subset of population which is chosen for investigation

For each population, there are many possible samples. By studying the sample, itis hoped to draw about conclusions about population.Sample is a window through which researcher can see entire population.For example:

A drop of blood ( sample) will tell us about body (population) chemistryPARAMETER

A value associated with population Any quantity which define a characteristic of whole population

----------------------------------------------- assigned GREEK letter ( )This value is unknown----------------------which therefore has to be estimatedA parameter is a fixed value --------------------------------which does not vary

SAMPLE STATISTIC

A value calculated from sample Any quantity which define a characteristic of a sample .

_-------------------------------------------assigned ROMAN letter ( X )This value is used to give information about unknown value in corresponding

population ( parameter)


8/22

INFERENTIAL STATISTICS

The process by drawing conclusions ( inferences) about population usinginformation ( data ) in samples.

There are two approaches;

Estimation of parameterHypothesis testing

ESTIMATION OF PARAMETER

A procedure to estimate unknown value of parameter by;

---------- Point estimate or Interval estimate ( Confidence Interval )

Point Estimate:

A single value is calculated to estimate population value (parameter)_Ex: X of sample is a point estimate of ( population value)

Confidence Interval:

Range of values within which parameter( population value) is likely to occur


9/22

CONFIDENCE INTERVAL

Sample statistics values vary from sample to sample.

Confidence Interval tell us how good is the estimation of parameter (populationvalue), on basis of information provided by Sample Statistics.

It is measure of accuracy within which we can pinpoint estimation of parameter.

CI is calculated on the basis of SE measurement, which allow us to create CI atspecified range of probability

CI is constructed at Confidence Levels ( CL)

CL -------------- tells you how sure you can be

--------There are 4 typical Confidence Levels ------ 99% 98% 95% 90%

----------------------- most researcher use--------- 95% CL

For example:

95 % Confidence Interval mean there is 95 % probability thatparameter lies within Confidence Limits ( upper & lower limits of ConfidenceIntervals) and 5 % probability that parameter lies outside the limits.


10/22

DATA SUMMARIZATION

Arrangement pattern of data

CENTRAL:--------- tendency of data points clustering in centerSPREAD:--------- tendency of data points dispersing in periphery

Summary measurement that expresses a single measure;-------------------------------- measure of central tendency( indicates centrality of data )

-------------------------------- measure of dispersion

( indicates scattering of data )

MEASURES OF CENTRAL TENDENCY

It is a summary of statistics to describe the tendency of observations to cluster inin the central part of data set.

The most common measures-----------Mean, Median & Mode

MEAN

Arithmetic average of distribution of values

Statistically mean--- sum of all scores divided by number of scores

Mathematically, it is expressed as:

Mean of sample;

_ x

X= _X(X bar)= sample mean

(capital sigma)= summation operator


11/22

X=each individual score (sample value)

= Total number of scores (sample size)

Blood Pressure in individuals---- ( sample size )-------1 , 2 , 3 , 4Blood Pressure( systolic)-- X ( individual observation) 120,150,110,100

120+150+110+100sample mean=

4= 120

Mean of population

x

=N

(mu) = population mean

x = population of X observation (population value)

N= Numbers of population members (Population size)

MEDIAN

Middle value when observations are arranged in ordered data

Ordered data ---------- can be------in ascending or descending order

If the total number of a data set are in odd number, then the middle most value ischosen as median, but if it is in even number then the average of two middlevalues will be the median.

It is useful in asymmetrical distribution of data.


12/22

MODE

Most frequently occurring value in a data set

French world meaning------------------------ fashion

A data set may have no mode or may have many modes

It is occasionally used for describing single distribution of data.

MEASURES OF VARIATION

-------------------------also known as---- DISPERSION or SCATTER

It is defined as--------Extent to which values in sample or population varyabout their mean.

The most common measures------------

Absolute measure --------- compare absolute accuracy of data

--------------Range, Variance, Standard Deviation

Relative measure -------- compare relative accuracy of data

--------- Coefficient of Variation

RANGE

It is difference between maximum and minimum values in a series

It is maximum value minus minimum value

R = R2-R1


13/22


14/22

STANDARD DEVIATION

It is measure of dispersion of data set

It is the index of variability ( spread) of the data about their Mean It tells us how much variability can be expected among individual values It is expressed in same units of measurement as original data

----- thus more meaningful---------------as square is eliminated Larger the Standard Deviation-----greater the dispersion

Lesser the Standard Deviation-----values are close to Mean

Statistically defined as;Square root of Variance

Formula forsample standard deviation___

S = V

____________Or S = ( x- x )2 / n-1

Formula forpopulation standard deviation

____________ = ( x- )2 / N

The steps to calculate standard deviation are:

1. Calculate mean of all measurements.2. Calculate difference between each individual measurement and the

mean3. Square all these differences.4. Take the sum of all squared differences.5. Finally take the square root of the value obtained.


15/22

Example:

11 children of 3 years of age were weighed. Their weights were:

13, 14, 14, 15, 16, 16, 16, 17, 17, 18 and 20 kilograms.

The no. of measurements n is 11.

To calculate standard deviation:

1. first calculate the mean, which is 16 Kg.

2. next we calculate deviation of each measurement from the mean.These are ;

3, 2, 2, 1, 0, 0, 0, 1, 1, 2, 4.

These values are then squared

9, 4, 4, 1, 0, 0, 0, 1, 1, 4, 16.

3. The sum of these squared deviations is 40.

4. This sum is divided by the total number of measurements minus one(n-1)

40/11-1 = 04

5. Finally take the square root to obtain standard deviation from mean.

__ 4 = 2Kg


16/22

STANDARD ERROR

------------------------------------Also called---------Standard Error of Mean

If you take out more than one sample from same population, all the samples willyield different Means.--------the variation in these sample Means---- is called------Standard Error

It is defined as;

It is the measure of the extent to which the sample mean deviatefrom population mean

It measure inter-sample variability It tells us how much variability can be expected among sample means

SE = SD

n

STANDARD ERROR OF PROPORTION

In dealing with qualitative data ------- Mean or SD are not applicable---------------------- so no chances of SE of Mean in qualitative data

in this situation ------------- SE of PROPORTION ----- applicable_____

SE of Proportion = pq / n

where p = proportion

q = 1- p


17/22

n = sample size

COEFFICIENT OF VARIATION

It is relative measure of dispersion

It is utilized to overcome the difficulties in comparing dispersing dataWhen units of measurement are different.

Statistically speaking -------- it is the standard deviation of the distributionexpressed as percentage of the mean of the distribution

coefficient of variation = standard deviation x 100mean

DEGREE OF FREEDOM

As most of our statistics is done on samples, we cannot be 100 % sure,therefore to make a conservative estimate we use devisor------ -1instead of------ for average deviation.

-------- defined as;

measure of variability which expresses number of optionsavailable within space

o number which tell us how many of the values may beindependently chosen


18/22

It is used in calculation ------- variance / SD, t-test, chi-square test

PROBABILITY

Probability mean------ chances of something happening

It is quantitative measure of all possible outcome of particular event

Event Possible outcome

Rolling a die 1, 2, 3, 4, 5, 6Tossing a coin heads, tails

Drawing cards 52 cards

If outcome sure to occur----------probability 1( certain event)

If outcome cannot occur----------probability 0 (null event)

Range of probability-------------- 0-1

Zero = no chancesOne = full certainty

Probability can also be defined as;

Relative frequency of occurrence of an event

Frequency = number of times particular score is achieved

Relative Frequency = frequency of scoresTotal number of scores

The concept that all men are sure to die-------expressed as----100 %---P=1.0


19/22

All other probabilities ----------------------------- measured with this standard

1 chance in 100 = 1 %, P = 0.01

1 chance in 500 = 0.2 % P = 0.002

1 chance in 1000 = 0.1 % P = 0.001Example:

If a treatment for cancer which has a 90 % success rate, the remaining10 % die.

If two patients come for treatment what is the probability that one will die?

The probability of either patient dying----------------------- 0.1

The probability of either patient not dying------------------ 0.9 ( 9/10)

The probability of both dying :---- 0.1 0.1 = 0.01

The probability of both recovering:----- 0.9 0.9 = 0.81

The balance of probability------ 1- Probability of the event of interest

[ 1-(0.81 + 0.01)]

[1- 0.82] = 0.18

------------------------------------the probability that one will die = 0.18


20/22

HYPOTHESISA statement of prediction

Statistical Hypothesis is defined as;

A statement of belief used in evaluation of apopulation parameter------- such as mean of a population

NULL HYPOTHESIS

It is the hypothesis that the samples or population being compared in anexperiment study/test are similar. Any difference appeared is due to chance andnot due to any other measurable factor.

-------------------------------------------------------It simply mean status quo.

Null hypothesis is comparable to the law courts assuming innocence untilguilt is demonstrated.

HYPOTHESIS TESTING/ SIGNIFICANCE TESTING

To test the viability of the Null Hypothesis in light of experimental data.

STATISTICAL SIGNIFICANCE

It means----------probably true-----------likely to be real

Defined as:-----

A procedure by which sample results are used to decide whether to accept orreject a Null Hypothesis.


21/22

The evidence obtained from the sample is not compatible with NullHypothesis--------- mean------- STATISTICAL SIGNIFICANT

-----------------------------------------------------decision is based on p value

Small p values-------- lesser than0.05------------------------------------------------------------------------low degree ofcompatibility between Null Hypothesis

and observed data----------Null Hypothesis rejected-------statistical significant test

Large p value------- greater than0.05------------------------------------------------------------------------high degree ofcompatibility between Null Hypothesis

and observed data---------- Null Hypothesis accepted----statistical not significant test

p-VALUE

It measure strength of statistical evidence in scientific study It is happening of phenomenon by chance It is probability of observing a result by chance Probability statement which measure strength of evidence against Null

Hypothesis

If p = 0.05-------- it mean that there is 5 out of 100 or 1/20 chances thathappening would be attributed to chance.

There are many different statistical tests to get p-value.1. Chi-square test2. Students t-test3. Z test

4. ANOVA testp-value is usually calculated by following tests, depending upon------------------------------------------ type of data

for quantitative data ------------------------------- t-testfor qualitative data ------------------------------ chi-square test


22/22

Biostatistics Community Medicine Gomal Medical College Notes

Documents

Transcript of Biostatistics Community Medicine Gomal Medical College Notes