Biostatistics Community Medicine Gomal Medical College Notes
Transcript of Biostatistics Community Medicine Gomal Medical College Notes
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
1/22
STATISTICS
Latin world, meaning useful to state
Numerical facts systematically arranged A scientific subject that deals with collection, compilation, presentation,
analysis, interpretation and making inferences (conclusions) of data.
BIOSTATISTICS
The applications of statistical methods to biological events
VITAL STATISTICS
Data from vital eventssuch as births, deaths, marriage, divorce, fetal deaths
It is a major source of information about health of population.
USES OF STATISTICS
1. To collect data in best possible way.
2. To describe characteristics of group3. To analyze the data & draw conclusions
SOURCES OF HEALTH STATISTICS
Registration of vital events Notification of diseases Record of hospitals Census
Surveys Surveillance HIMS
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
2/22
DATA Any collected piece of information
Observations made on individuals These are individual values measured, observed or presented. Recorded values of characteristics of individual of population or sample
These are basic building blocks of statistics.
TYPES OF DATA
PRIMARY DATAData collected for first time to answer specific question of
interest in study.
SECONDARY DATAPreviously gathered data for some other purpose.
COLLECTION OF DATA
There are two approaches of data collectiona. CENSUS:---- complete enumeration of whole field
-------costly and time consumingb. SAMPLING:----- partial enumeration
------saving money and time
METHODS OF COLLECTION OF PRIMARY DATA
1. Observation2. Questionnaire3. Interview4. Case studies5. Documentation survey
METHODS OF COLLECTION OF SECONDARY DATA
1. Official publications2. Journals & newspapers3. Research organization
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
3/22
VARIABLE
Any factor that varies Any quantity that varies Any collected piece of information that varies A characteristic of the individual of a population or sample which varies
from individual to individual.Examples: age, weight, income
The variable age of person can take different values-----because a person canbe 20 years old,35 years old and so on.
It is a basic unit to perform a research. All medical research is study ofrelationship among variables.It provide yardstick on which the effects of treatments or experiences aremeasured------------------------------ it is characteristic of interest in study.
TYPES OF VARIABLES
------------- according to form of characteristic of interest
NUMERICAL/ QUANTATIVE VARIABLES
Variables whose values are expressed in numbers
Examples: age, weight, number of children, monthly income
CATEGORICAL/ QUALITATIVE VARIABLES
Variables whose values are expressed in categories
Examples:Color: red, blue and greenOutcome of disease: recovery, chronicity and death------ where choice of answers are limited to yes or no
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
4/22
DEPENDENT VARIABLES
The variable that is used to describe the problem under study
-------------------------------------- also called--------Effect VariableExample:
A study to see relationship between mother education and malnutrition inchildren----------------------------------- malnutrition is a dependent variable INDEPENDENT VARIABLES
The variables that is used to describe the factors that cause or influence theproblem under study
------------------------------------- also called-------------Cause VariableExample:
A study to see relationship between smoking and lung cancer-------------------------------------------------------------- smoking is the independent variable( with values varying from not smoking to smoking more than 3 packets/day)
CONFOUNDING VARIABLE
A variable that is expressed as nuisance effect that distort true relationshipbetween independent variable (exposure) & dependent variable(disease/outcome) \
Also known as-------- intervening or background or contaminated variable.------ it confuses our research---- it projects in research but not real variable
Example:
Mother education------------------------------------ ( independent variable)Malnutrition ------------------------------------ ( dependent variable )Family income ------------------------------------ ( confounding variable ) commonconfounding variables------------------------ age, sex, socio-economic status
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
5/22
INDICATOR
A variable with characteristics of quality, quantity or time.
----It is operationalizing(defining) the variables--------------- making themmeasurable------------------------------------------- measuring tool of variableExample:variable:--------------------------------------------- household incomeindicators:-----------------------high income ( Rs.5000 and above per month)
middle income ( Rs.2000-4999 per month)low income ( less than Rs. 2000 per month)
HEALTH INDICATOR
An indicator which measure different dimensions or changes in health.Example:Number of deaths due to child bearing & puerperium among total live births in a
year ---------------------------------------------------(Maternal mortality rate)
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
6/22
Analysis between Demographic and Research variables:-------------------- Tests of Significance/Hypothesis testing
Analysis between Research variables:----------------------------------------- Tests of Correlation and Regression
Assessment of relationship------------ between---------------- two Researchvariables
---------------------------depends upon------purpose of reserach-----------------degree of relationship :------------------------------------- Correlation
prediction (forecasting):---------------------------------------Regression
Correlation & Regression are two statistical techniques used to define therelationship between two different variables when measured on same people instudy.
Correlation ------ A statistical tool that tell us how close relationship betweenvariablesFor example: Age & Weight relationship of boys
-------- to study whether a high value in age corresponds tovalue in weight of boys
Regression------ A statistical method that uses relationship between 2 or morevariables such that the value of one variable can be predicted based on value ofthe other----------------It predicts value of one variable knowing value of another
variableRegression analysis is the methodology used for the purpose of prediction
one variable is considered to be predicted variable------
its value vary according to predictor variablefor example:
predicted variable------------ marks obtained in exampredictor variable------------- time spent on study
predicted variable --------------------- yield of cropspredictor variable --------------------- amount of rainfall
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
7/22
POPULATION (universe )
A large collection of items that have characteristic in common
The items----------------------------------------- people, animals, plants or things
It is the entire group we are interested in, which we wish to describe ordraw conclusions about.
It is the entire group about which some specific information is required orrecorded
Examples: Students in class, chairs in class, books in library, fishes in a lake
SAMPLEA subset of population which is chosen for investigation
For each population, there are many possible samples. By studying the sample, itis hoped to draw about conclusions about population.Sample is a window through which researcher can see entire population.For example:
A drop of blood ( sample) will tell us about body (population) chemistryPARAMETER
A value associated with population Any quantity which define a characteristic of whole population
----------------------------------------------- assigned GREEK letter ( )This value is unknown----------------------which therefore has to be estimatedA parameter is a fixed value --------------------------------which does not vary
SAMPLE STATISTIC
A value calculated from sample Any quantity which define a characteristic of a sample .
_-------------------------------------------assigned ROMAN letter ( X )This value is used to give information about unknown value in corresponding
population ( parameter)
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
8/22
INFERENTIAL STATISTICS
The process by drawing conclusions ( inferences) about population usinginformation ( data ) in samples.
There are two approaches;
Estimation of parameterHypothesis testing
ESTIMATION OF PARAMETER
A procedure to estimate unknown value of parameter by;
---------- Point estimate or Interval estimate ( Confidence Interval )
Point Estimate:
A single value is calculated to estimate population value (parameter)_Ex: X of sample is a point estimate of ( population value)
Confidence Interval:
Range of values within which parameter( population value) is likely to occur
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
9/22
CONFIDENCE INTERVAL
Sample statistics values vary from sample to sample.
Confidence Interval tell us how good is the estimation of parameter (populationvalue), on basis of information provided by Sample Statistics.
It is measure of accuracy within which we can pinpoint estimation of parameter.
CI is calculated on the basis of SE measurement, which allow us to create CI atspecified range of probability
CI is constructed at Confidence Levels ( CL)
CL -------------- tells you how sure you can be
--------There are 4 typical Confidence Levels ------ 99% 98% 95% 90%
----------------------- most researcher use--------- 95% CL
For example:
95 % Confidence Interval mean there is 95 % probability thatparameter lies within Confidence Limits ( upper & lower limits of ConfidenceIntervals) and 5 % probability that parameter lies outside the limits.
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
10/22
DATA SUMMARIZATION
Arrangement pattern of data
CENTRAL:--------- tendency of data points clustering in centerSPREAD:--------- tendency of data points dispersing in periphery
Summary measurement that expresses a single measure;-------------------------------- measure of central tendency( indicates centrality of data )
-------------------------------- measure of dispersion
( indicates scattering of data )
MEASURES OF CENTRAL TENDENCY
It is a summary of statistics to describe the tendency of observations to cluster inin the central part of data set.
The most common measures-----------Mean, Median & Mode
MEAN
Arithmetic average of distribution of values
Statistically mean--- sum of all scores divided by number of scores
Mathematically, it is expressed as:
Mean of sample;
_ x
X= _X(X bar)= sample mean
(capital sigma)= summation operator
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
11/22
X=each individual score (sample value)
= Total number of scores (sample size)
Blood Pressure in individuals---- ( sample size )-------1 , 2 , 3 , 4Blood Pressure( systolic)-- X ( individual observation) 120,150,110,100
120+150+110+100sample mean=
4= 120
Mean of population
x
=N
(mu) = population mean
x = population of X observation (population value)
N= Numbers of population members (Population size)
MEDIAN
Middle value when observations are arranged in ordered data
Ordered data ---------- can be------in ascending or descending order
If the total number of a data set are in odd number, then the middle most value ischosen as median, but if it is in even number then the average of two middlevalues will be the median.
It is useful in asymmetrical distribution of data.
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
12/22
MODE
Most frequently occurring value in a data set
French world meaning------------------------ fashion
A data set may have no mode or may have many modes
It is occasionally used for describing single distribution of data.
MEASURES OF VARIATION
-------------------------also known as---- DISPERSION or SCATTER
It is defined as--------Extent to which values in sample or population varyabout their mean.
The most common measures------------
Absolute measure --------- compare absolute accuracy of data
--------------Range, Variance, Standard Deviation
Relative measure -------- compare relative accuracy of data
--------- Coefficient of Variation
RANGE
It is difference between maximum and minimum values in a series
It is maximum value minus minimum value
R = R2-R1
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
13/22
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
14/22
STANDARD DEVIATION
It is measure of dispersion of data set
It is the index of variability ( spread) of the data about their Mean It tells us how much variability can be expected among individual values It is expressed in same units of measurement as original data
----- thus more meaningful---------------as square is eliminated Larger the Standard Deviation-----greater the dispersion
Lesser the Standard Deviation-----values are close to Mean
Statistically defined as;Square root of Variance
Formula forsample standard deviation___
S = V
____________Or S = ( x- x )2 / n-1
Formula forpopulation standard deviation
____________ = ( x- )2 / N
The steps to calculate standard deviation are:
1. Calculate mean of all measurements.2. Calculate difference between each individual measurement and the
mean3. Square all these differences.4. Take the sum of all squared differences.5. Finally take the square root of the value obtained.
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
15/22
Example:
11 children of 3 years of age were weighed. Their weights were:
13, 14, 14, 15, 16, 16, 16, 17, 17, 18 and 20 kilograms.
The no. of measurements n is 11.
To calculate standard deviation:
1. first calculate the mean, which is 16 Kg.
2. next we calculate deviation of each measurement from the mean.These are ;
3, 2, 2, 1, 0, 0, 0, 1, 1, 2, 4.
These values are then squared
9, 4, 4, 1, 0, 0, 0, 1, 1, 4, 16.
3. The sum of these squared deviations is 40.
4. This sum is divided by the total number of measurements minus one(n-1)
40/11-1 = 04
5. Finally take the square root to obtain standard deviation from mean.
__ 4 = 2Kg
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
16/22
STANDARD ERROR
------------------------------------Also called---------Standard Error of Mean
If you take out more than one sample from same population, all the samples willyield different Means.--------the variation in these sample Means---- is called------Standard Error
It is defined as;
It is the measure of the extent to which the sample mean deviatefrom population mean
It measure inter-sample variability It tells us how much variability can be expected among sample means
SE = SD
n
STANDARD ERROR OF PROPORTION
In dealing with qualitative data ------- Mean or SD are not applicable---------------------- so no chances of SE of Mean in qualitative data
in this situation ------------- SE of PROPORTION ----- applicable_____
SE of Proportion = pq / n
where p = proportion
q = 1- p
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
17/22
n = sample size
COEFFICIENT OF VARIATION
It is relative measure of dispersion
It is utilized to overcome the difficulties in comparing dispersing dataWhen units of measurement are different.
Statistically speaking -------- it is the standard deviation of the distributionexpressed as percentage of the mean of the distribution
coefficient of variation = standard deviation x 100mean
DEGREE OF FREEDOM
As most of our statistics is done on samples, we cannot be 100 % sure,therefore to make a conservative estimate we use devisor------ -1instead of------ for average deviation.
-------- defined as;
measure of variability which expresses number of optionsavailable within space
o number which tell us how many of the values may beindependently chosen
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
18/22
It is used in calculation ------- variance / SD, t-test, chi-square test
PROBABILITY
Probability mean------ chances of something happening
It is quantitative measure of all possible outcome of particular event
Event Possible outcome
Rolling a die 1, 2, 3, 4, 5, 6Tossing a coin heads, tails
Drawing cards 52 cards
If outcome sure to occur----------probability 1( certain event)
If outcome cannot occur----------probability 0 (null event)
Range of probability-------------- 0-1
Zero = no chancesOne = full certainty
Probability can also be defined as;
Relative frequency of occurrence of an event
Frequency = number of times particular score is achieved
Relative Frequency = frequency of scoresTotal number of scores
The concept that all men are sure to die-------expressed as----100 %---P=1.0
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
19/22
All other probabilities ----------------------------- measured with this standard
1 chance in 100 = 1 %, P = 0.01
1 chance in 500 = 0.2 % P = 0.002
1 chance in 1000 = 0.1 % P = 0.001Example:
If a treatment for cancer which has a 90 % success rate, the remaining10 % die.
If two patients come for treatment what is the probability that one will die?
The probability of either patient dying----------------------- 0.1
The probability of either patient not dying------------------ 0.9 ( 9/10)
The probability of both dying :---- 0.1 0.1 = 0.01
The probability of both recovering:----- 0.9 0.9 = 0.81
The balance of probability------ 1- Probability of the event of interest
[ 1-(0.81 + 0.01)]
[1- 0.82] = 0.18
------------------------------------the probability that one will die = 0.18
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
20/22
HYPOTHESISA statement of prediction
Statistical Hypothesis is defined as;
A statement of belief used in evaluation of apopulation parameter------- such as mean of a population
NULL HYPOTHESIS
It is the hypothesis that the samples or population being compared in anexperiment study/test are similar. Any difference appeared is due to chance andnot due to any other measurable factor.
-------------------------------------------------------It simply mean status quo.
Null hypothesis is comparable to the law courts assuming innocence untilguilt is demonstrated.
HYPOTHESIS TESTING/ SIGNIFICANCE TESTING
To test the viability of the Null Hypothesis in light of experimental data.
STATISTICAL SIGNIFICANCE
It means----------probably true-----------likely to be real
Defined as:-----
A procedure by which sample results are used to decide whether to accept orreject a Null Hypothesis.
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
21/22
The evidence obtained from the sample is not compatible with NullHypothesis--------- mean------- STATISTICAL SIGNIFICANT
-----------------------------------------------------decision is based on p value
Small p values-------- lesser than0.05------------------------------------------------------------------------low degree ofcompatibility between Null Hypothesis
and observed data----------Null Hypothesis rejected-------statistical significant test
Large p value------- greater than0.05------------------------------------------------------------------------high degree ofcompatibility between Null Hypothesis
and observed data---------- Null Hypothesis accepted----statistical not significant test
p-VALUE
It measure strength of statistical evidence in scientific study It is happening of phenomenon by chance It is probability of observing a result by chance Probability statement which measure strength of evidence against Null
Hypothesis
If p = 0.05-------- it mean that there is 5 out of 100 or 1/20 chances thathappening would be attributed to chance.
There are many different statistical tests to get p-value.1. Chi-square test2. Students t-test3. Z test
4. ANOVA testp-value is usually calculated by following tests, depending upon------------------------------------------ type of data
for quantitative data ------------------------------- t-testfor qualitative data ------------------------------ chi-square test
-
7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes
22/22