Biostatistics Community Medicine Gomal Medical College Notes

download Biostatistics Community Medicine Gomal Medical College Notes

of 22

Transcript of Biostatistics Community Medicine Gomal Medical College Notes

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    1/22

    STATISTICS

    Latin world, meaning useful to state

    Numerical facts systematically arranged A scientific subject that deals with collection, compilation, presentation,

    analysis, interpretation and making inferences (conclusions) of data.

    BIOSTATISTICS

    The applications of statistical methods to biological events

    VITAL STATISTICS

    Data from vital eventssuch as births, deaths, marriage, divorce, fetal deaths

    It is a major source of information about health of population.

    USES OF STATISTICS

    1. To collect data in best possible way.

    2. To describe characteristics of group3. To analyze the data & draw conclusions

    SOURCES OF HEALTH STATISTICS

    Registration of vital events Notification of diseases Record of hospitals Census

    Surveys Surveillance HIMS

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    2/22

    DATA Any collected piece of information

    Observations made on individuals These are individual values measured, observed or presented. Recorded values of characteristics of individual of population or sample

    These are basic building blocks of statistics.

    TYPES OF DATA

    PRIMARY DATAData collected for first time to answer specific question of

    interest in study.

    SECONDARY DATAPreviously gathered data for some other purpose.

    COLLECTION OF DATA

    There are two approaches of data collectiona. CENSUS:---- complete enumeration of whole field

    -------costly and time consumingb. SAMPLING:----- partial enumeration

    ------saving money and time

    METHODS OF COLLECTION OF PRIMARY DATA

    1. Observation2. Questionnaire3. Interview4. Case studies5. Documentation survey

    METHODS OF COLLECTION OF SECONDARY DATA

    1. Official publications2. Journals & newspapers3. Research organization

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    3/22

    VARIABLE

    Any factor that varies Any quantity that varies Any collected piece of information that varies A characteristic of the individual of a population or sample which varies

    from individual to individual.Examples: age, weight, income

    The variable age of person can take different values-----because a person canbe 20 years old,35 years old and so on.

    It is a basic unit to perform a research. All medical research is study ofrelationship among variables.It provide yardstick on which the effects of treatments or experiences aremeasured------------------------------ it is characteristic of interest in study.

    TYPES OF VARIABLES

    ------------- according to form of characteristic of interest

    NUMERICAL/ QUANTATIVE VARIABLES

    Variables whose values are expressed in numbers

    Examples: age, weight, number of children, monthly income

    CATEGORICAL/ QUALITATIVE VARIABLES

    Variables whose values are expressed in categories

    Examples:Color: red, blue and greenOutcome of disease: recovery, chronicity and death------ where choice of answers are limited to yes or no

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    4/22

    DEPENDENT VARIABLES

    The variable that is used to describe the problem under study

    -------------------------------------- also called--------Effect VariableExample:

    A study to see relationship between mother education and malnutrition inchildren----------------------------------- malnutrition is a dependent variable INDEPENDENT VARIABLES

    The variables that is used to describe the factors that cause or influence theproblem under study

    ------------------------------------- also called-------------Cause VariableExample:

    A study to see relationship between smoking and lung cancer-------------------------------------------------------------- smoking is the independent variable( with values varying from not smoking to smoking more than 3 packets/day)

    CONFOUNDING VARIABLE

    A variable that is expressed as nuisance effect that distort true relationshipbetween independent variable (exposure) & dependent variable(disease/outcome) \

    Also known as-------- intervening or background or contaminated variable.------ it confuses our research---- it projects in research but not real variable

    Example:

    Mother education------------------------------------ ( independent variable)Malnutrition ------------------------------------ ( dependent variable )Family income ------------------------------------ ( confounding variable ) commonconfounding variables------------------------ age, sex, socio-economic status

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    5/22

    INDICATOR

    A variable with characteristics of quality, quantity or time.

    ----It is operationalizing(defining) the variables--------------- making themmeasurable------------------------------------------- measuring tool of variableExample:variable:--------------------------------------------- household incomeindicators:-----------------------high income ( Rs.5000 and above per month)

    middle income ( Rs.2000-4999 per month)low income ( less than Rs. 2000 per month)

    HEALTH INDICATOR

    An indicator which measure different dimensions or changes in health.Example:Number of deaths due to child bearing & puerperium among total live births in a

    year ---------------------------------------------------(Maternal mortality rate)

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    6/22

    Analysis between Demographic and Research variables:-------------------- Tests of Significance/Hypothesis testing

    Analysis between Research variables:----------------------------------------- Tests of Correlation and Regression

    Assessment of relationship------------ between---------------- two Researchvariables

    ---------------------------depends upon------purpose of reserach-----------------degree of relationship :------------------------------------- Correlation

    prediction (forecasting):---------------------------------------Regression

    Correlation & Regression are two statistical techniques used to define therelationship between two different variables when measured on same people instudy.

    Correlation ------ A statistical tool that tell us how close relationship betweenvariablesFor example: Age & Weight relationship of boys

    -------- to study whether a high value in age corresponds tovalue in weight of boys

    Regression------ A statistical method that uses relationship between 2 or morevariables such that the value of one variable can be predicted based on value ofthe other----------------It predicts value of one variable knowing value of another

    variableRegression analysis is the methodology used for the purpose of prediction

    one variable is considered to be predicted variable------

    its value vary according to predictor variablefor example:

    predicted variable------------ marks obtained in exampredictor variable------------- time spent on study

    predicted variable --------------------- yield of cropspredictor variable --------------------- amount of rainfall

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    7/22

    POPULATION (universe )

    A large collection of items that have characteristic in common

    The items----------------------------------------- people, animals, plants or things

    It is the entire group we are interested in, which we wish to describe ordraw conclusions about.

    It is the entire group about which some specific information is required orrecorded

    Examples: Students in class, chairs in class, books in library, fishes in a lake

    SAMPLEA subset of population which is chosen for investigation

    For each population, there are many possible samples. By studying the sample, itis hoped to draw about conclusions about population.Sample is a window through which researcher can see entire population.For example:

    A drop of blood ( sample) will tell us about body (population) chemistryPARAMETER

    A value associated with population Any quantity which define a characteristic of whole population

    ----------------------------------------------- assigned GREEK letter ( )This value is unknown----------------------which therefore has to be estimatedA parameter is a fixed value --------------------------------which does not vary

    SAMPLE STATISTIC

    A value calculated from sample Any quantity which define a characteristic of a sample .

    _-------------------------------------------assigned ROMAN letter ( X )This value is used to give information about unknown value in corresponding

    population ( parameter)

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    8/22

    INFERENTIAL STATISTICS

    The process by drawing conclusions ( inferences) about population usinginformation ( data ) in samples.

    There are two approaches;

    Estimation of parameterHypothesis testing

    ESTIMATION OF PARAMETER

    A procedure to estimate unknown value of parameter by;

    ---------- Point estimate or Interval estimate ( Confidence Interval )

    Point Estimate:

    A single value is calculated to estimate population value (parameter)_Ex: X of sample is a point estimate of ( population value)

    Confidence Interval:

    Range of values within which parameter( population value) is likely to occur

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    9/22

    CONFIDENCE INTERVAL

    Sample statistics values vary from sample to sample.

    Confidence Interval tell us how good is the estimation of parameter (populationvalue), on basis of information provided by Sample Statistics.

    It is measure of accuracy within which we can pinpoint estimation of parameter.

    CI is calculated on the basis of SE measurement, which allow us to create CI atspecified range of probability

    CI is constructed at Confidence Levels ( CL)

    CL -------------- tells you how sure you can be

    --------There are 4 typical Confidence Levels ------ 99% 98% 95% 90%

    ----------------------- most researcher use--------- 95% CL

    For example:

    95 % Confidence Interval mean there is 95 % probability thatparameter lies within Confidence Limits ( upper & lower limits of ConfidenceIntervals) and 5 % probability that parameter lies outside the limits.

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    10/22

    DATA SUMMARIZATION

    Arrangement pattern of data

    CENTRAL:--------- tendency of data points clustering in centerSPREAD:--------- tendency of data points dispersing in periphery

    Summary measurement that expresses a single measure;-------------------------------- measure of central tendency( indicates centrality of data )

    -------------------------------- measure of dispersion

    ( indicates scattering of data )

    MEASURES OF CENTRAL TENDENCY

    It is a summary of statistics to describe the tendency of observations to cluster inin the central part of data set.

    The most common measures-----------Mean, Median & Mode

    MEAN

    Arithmetic average of distribution of values

    Statistically mean--- sum of all scores divided by number of scores

    Mathematically, it is expressed as:

    Mean of sample;

    _ x

    X= _X(X bar)= sample mean

    (capital sigma)= summation operator

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    11/22

    X=each individual score (sample value)

    = Total number of scores (sample size)

    Blood Pressure in individuals---- ( sample size )-------1 , 2 , 3 , 4Blood Pressure( systolic)-- X ( individual observation) 120,150,110,100

    120+150+110+100sample mean=

    4= 120

    Mean of population

    x

    =N

    (mu) = population mean

    x = population of X observation (population value)

    N= Numbers of population members (Population size)

    MEDIAN

    Middle value when observations are arranged in ordered data

    Ordered data ---------- can be------in ascending or descending order

    If the total number of a data set are in odd number, then the middle most value ischosen as median, but if it is in even number then the average of two middlevalues will be the median.

    It is useful in asymmetrical distribution of data.

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    12/22

    MODE

    Most frequently occurring value in a data set

    French world meaning------------------------ fashion

    A data set may have no mode or may have many modes

    It is occasionally used for describing single distribution of data.

    MEASURES OF VARIATION

    -------------------------also known as---- DISPERSION or SCATTER

    It is defined as--------Extent to which values in sample or population varyabout their mean.

    The most common measures------------

    Absolute measure --------- compare absolute accuracy of data

    --------------Range, Variance, Standard Deviation

    Relative measure -------- compare relative accuracy of data

    --------- Coefficient of Variation

    RANGE

    It is difference between maximum and minimum values in a series

    It is maximum value minus minimum value

    R = R2-R1

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    13/22

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    14/22

    STANDARD DEVIATION

    It is measure of dispersion of data set

    It is the index of variability ( spread) of the data about their Mean It tells us how much variability can be expected among individual values It is expressed in same units of measurement as original data

    ----- thus more meaningful---------------as square is eliminated Larger the Standard Deviation-----greater the dispersion

    Lesser the Standard Deviation-----values are close to Mean

    Statistically defined as;Square root of Variance

    Formula forsample standard deviation___

    S = V

    ____________Or S = ( x- x )2 / n-1

    Formula forpopulation standard deviation

    ____________ = ( x- )2 / N

    The steps to calculate standard deviation are:

    1. Calculate mean of all measurements.2. Calculate difference between each individual measurement and the

    mean3. Square all these differences.4. Take the sum of all squared differences.5. Finally take the square root of the value obtained.

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    15/22

    Example:

    11 children of 3 years of age were weighed. Their weights were:

    13, 14, 14, 15, 16, 16, 16, 17, 17, 18 and 20 kilograms.

    The no. of measurements n is 11.

    To calculate standard deviation:

    1. first calculate the mean, which is 16 Kg.

    2. next we calculate deviation of each measurement from the mean.These are ;

    3, 2, 2, 1, 0, 0, 0, 1, 1, 2, 4.

    These values are then squared

    9, 4, 4, 1, 0, 0, 0, 1, 1, 4, 16.

    3. The sum of these squared deviations is 40.

    4. This sum is divided by the total number of measurements minus one(n-1)

    40/11-1 = 04

    5. Finally take the square root to obtain standard deviation from mean.

    __ 4 = 2Kg

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    16/22

    STANDARD ERROR

    ------------------------------------Also called---------Standard Error of Mean

    If you take out more than one sample from same population, all the samples willyield different Means.--------the variation in these sample Means---- is called------Standard Error

    It is defined as;

    It is the measure of the extent to which the sample mean deviatefrom population mean

    It measure inter-sample variability It tells us how much variability can be expected among sample means

    SE = SD

    n

    STANDARD ERROR OF PROPORTION

    In dealing with qualitative data ------- Mean or SD are not applicable---------------------- so no chances of SE of Mean in qualitative data

    in this situation ------------- SE of PROPORTION ----- applicable_____

    SE of Proportion = pq / n

    where p = proportion

    q = 1- p

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    17/22

    n = sample size

    COEFFICIENT OF VARIATION

    It is relative measure of dispersion

    It is utilized to overcome the difficulties in comparing dispersing dataWhen units of measurement are different.

    Statistically speaking -------- it is the standard deviation of the distributionexpressed as percentage of the mean of the distribution

    coefficient of variation = standard deviation x 100mean

    DEGREE OF FREEDOM

    As most of our statistics is done on samples, we cannot be 100 % sure,therefore to make a conservative estimate we use devisor------ -1instead of------ for average deviation.

    -------- defined as;

    measure of variability which expresses number of optionsavailable within space

    o number which tell us how many of the values may beindependently chosen

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    18/22

    It is used in calculation ------- variance / SD, t-test, chi-square test

    PROBABILITY

    Probability mean------ chances of something happening

    It is quantitative measure of all possible outcome of particular event

    Event Possible outcome

    Rolling a die 1, 2, 3, 4, 5, 6Tossing a coin heads, tails

    Drawing cards 52 cards

    If outcome sure to occur----------probability 1( certain event)

    If outcome cannot occur----------probability 0 (null event)

    Range of probability-------------- 0-1

    Zero = no chancesOne = full certainty

    Probability can also be defined as;

    Relative frequency of occurrence of an event

    Frequency = number of times particular score is achieved

    Relative Frequency = frequency of scoresTotal number of scores

    The concept that all men are sure to die-------expressed as----100 %---P=1.0

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    19/22

    All other probabilities ----------------------------- measured with this standard

    1 chance in 100 = 1 %, P = 0.01

    1 chance in 500 = 0.2 % P = 0.002

    1 chance in 1000 = 0.1 % P = 0.001Example:

    If a treatment for cancer which has a 90 % success rate, the remaining10 % die.

    If two patients come for treatment what is the probability that one will die?

    The probability of either patient dying----------------------- 0.1

    The probability of either patient not dying------------------ 0.9 ( 9/10)

    The probability of both dying :---- 0.1 0.1 = 0.01

    The probability of both recovering:----- 0.9 0.9 = 0.81

    The balance of probability------ 1- Probability of the event of interest

    [ 1-(0.81 + 0.01)]

    [1- 0.82] = 0.18

    ------------------------------------the probability that one will die = 0.18

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    20/22

    HYPOTHESISA statement of prediction

    Statistical Hypothesis is defined as;

    A statement of belief used in evaluation of apopulation parameter------- such as mean of a population

    NULL HYPOTHESIS

    It is the hypothesis that the samples or population being compared in anexperiment study/test are similar. Any difference appeared is due to chance andnot due to any other measurable factor.

    -------------------------------------------------------It simply mean status quo.

    Null hypothesis is comparable to the law courts assuming innocence untilguilt is demonstrated.

    HYPOTHESIS TESTING/ SIGNIFICANCE TESTING

    To test the viability of the Null Hypothesis in light of experimental data.

    STATISTICAL SIGNIFICANCE

    It means----------probably true-----------likely to be real

    Defined as:-----

    A procedure by which sample results are used to decide whether to accept orreject a Null Hypothesis.

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    21/22

    The evidence obtained from the sample is not compatible with NullHypothesis--------- mean------- STATISTICAL SIGNIFICANT

    -----------------------------------------------------decision is based on p value

    Small p values-------- lesser than0.05------------------------------------------------------------------------low degree ofcompatibility between Null Hypothesis

    and observed data----------Null Hypothesis rejected-------statistical significant test

    Large p value------- greater than0.05------------------------------------------------------------------------high degree ofcompatibility between Null Hypothesis

    and observed data---------- Null Hypothesis accepted----statistical not significant test

    p-VALUE

    It measure strength of statistical evidence in scientific study It is happening of phenomenon by chance It is probability of observing a result by chance Probability statement which measure strength of evidence against Null

    Hypothesis

    If p = 0.05-------- it mean that there is 5 out of 100 or 1/20 chances thathappening would be attributed to chance.

    There are many different statistical tests to get p-value.1. Chi-square test2. Students t-test3. Z test

    4. ANOVA testp-value is usually calculated by following tests, depending upon------------------------------------------ type of data

    for quantitative data ------------------------------- t-testfor qualitative data ------------------------------ chi-square test

  • 7/27/2019 Biostatistics Community Medicine Gomal Medical College Notes

    22/22