1 Descriptive Studies

download 1 Descriptive Studies

of 65

Transcript of 1 Descriptive Studies

  • 8/8/2019 1 Descriptive Studies

    1/65

    Descriptive Studies

  • 8/8/2019 1 Descriptive Studies

    2/65

    Statistical methods fall into two broad areas:Descriptive statistics

    Inferential statistics.

  • 8/8/2019 1 Descriptive Studies

    3/65

    Descriptive statisticsDescriptive statistics merely describe,

    organize, or summarize data; they referonly to the actual data available.

    Examples include the mean blood pressure

    of a group of patients and the success rateof a surgical procedure.

  • 8/8/2019 1 Descriptive Studies

    4/65

    Inferential statisticsInferential statistics involve making

    inferences that go beyond the actual data.

    They usually involve inductive reasoning(i.e., generalizing to a population after

    having observed only a sample).Examples include the mean blood pressure

    of all Americans and the expected successrate of a surgical procedure in patients

    who have not yet undergone theoperation.

  • 8/8/2019 1 Descriptive Studies

    5/65

    POPULATIONS, SAMPLES,AND ELEMENTS

  • 8/8/2019 1 Descriptive Studies

    6/65

    A population is the universe about which aninvestigator wishes to draw conclusions; itneed not consist of people but may be apopulation of measurements.

    Strictly speaking, if an investigator wants todraw conclusions about the blood pressureof Americans, the population consists ofthe blood pressure measurements, not the

    Americans themselves.

  • 8/8/2019 1 Descriptive Studies

    7/65

    A sample is a subset of the populationthe partthat is actually being observed or studied.

    Because researchers rarely can study wholepopulations, inferential statistics are almostalways needed to draw conclusions about a

    population when only a sample has actuallybeen studied.

    A single observationsuch as one person'sblood pressureis an element, denoted by X.

    The number of elements in a population isdenoted by N, and the number of elements ina sample by n.

    A population therefore consists of all theelements from X to XN, and a sample of n of

    these N elements .

  • 8/8/2019 1 Descriptive Studies

    8/65

    Most samples used in biomedical research areprobability samples samples in which theresearcher can specify the probability of anyone element in the population being included.

    For example, if someone is picking a sample of 4

    playing cards at random from a pack of 52cards, the probability that any one card will beincluded is 4/52.

    Probability samples permit the use of inferentialstatistics, whereas non-probability samplesallow only descriptive statistics to be used.

    There are four basic kinds of probabilitysamples:Simple random samplesStratified random samplesCluster samples, andSystematic samples.

  • 8/8/2019 1 Descriptive Studies

    9/65

    Simple random samplesThe simple random sample is the simplest

    kind of probability sample.

    It is drawn in such a way that every elementin the population has an equal probability

    of being included, such as in the playingcard example above.

    A random sample is defined by the methodof drawing the sample, not by the

    outcome.If four hearts were picked out of the pack of

    cards, this does not in itself mean that thesample is not random.

  • 8/8/2019 1 Descriptive Studies

    10/65

    A sample is representative if it closelyresembles the population from which it isdrawn.

    All types of random samples tend to be

    representative, but they cannot guaranteerepresentativeness.

    Nonrepresentative samples can causeserious problems. (Four hearts are clearly

    not representative of all the cards in apack.)

  • 8/8/2019 1 Descriptive Studies

    11/65

    A sample or a result demonstrates bias if itconsistently errs in a particular direction.

    For example, in drawing a sample of 10 froma population consisting of 500 white

    people and 500 black people, a samplingmethod that consistently produces morethan 5 white people would be biased.

    Biased samples are therefore

    unrepresentative, and true randomizationis proof against bias.

  • 8/8/2019 1 Descriptive Studies

    12/65

    Stratified random samplesIn a stratified random sample, the

    population is first divided into relativelyinternally homogeneous groups, or strata,from which random samples are then

    drawn.This stratification results in greater

    representativeness.

    For example, instead of drawing one sample

    of 10 people from a total populationconsisting of 500 white and 500 blackpeople, one random sample of 5 could betaken from each ethnic group (or stratum)

    separately, thus guaranteeing the racial

  • 8/8/2019 1 Descriptive Studies

    13/65

    Cluster samplesCluster samples may be used when it is too

    expensive or laborious to draw a simplerandom or stratified random sample.

    For example, in a survey of 100 medical

    students in the United States, aninvestigator might start by selecting arandom set or - groups or "clusters"suchas a random set of 10 U.S. medical schoolsand then interviewing all the students in

    those 10 schools.This method is much more economical and

    practical than trying to take a randomsample of 100 directly from the population

    of all U.S. medical students.

  • 8/8/2019 1 Descriptive Studies

    14/65

    Systematic samplesThese involve selecting elements in a

    systematic waysuch as every fifthpatient admitted to a hospital or everythird baby born in a given area.

    This type of sampling usually provides theequivalent of a simple random samplewithout actually using randomization.

  • 8/8/2019 1 Descriptive Studies

    15/65

    Sampling problems are commonin clinical research.

    For example, if a researcher advertises in anewspaper to recruit people suffering froma particular problemwhether it is acne,diabetes, or depressionthe people whorespond form a self-selected sample, whichis probably not representative of thePopulation of all people with this problem.

    Similarly, if a dermatologist reports on theresults of a new treatment for acne which

    he has been using with his patients, thesample may not be representative of allpeople with acne, as it is likely that onlypeople with more severe acne (or withgood insurance coverage!) seek treatmentfrom a dermatologist.

  • 8/8/2019 1 Descriptive Studies

    16/65

    In any case, his practice is probably limitedto people in a particular geographic,climatic, and possibly ethnic area.

    In this case, although his study may be valid

    as far as his or her patients are concerned(this is called internal validity), it may notbe valid to generalize his findings topeople with acne in general (so the study

    may lack external validity).

  • 8/8/2019 1 Descriptive Studies

    17/65

    PROBABILITYProbability of an event is denoted by p.Probabilities are usually expressed as

    decimal fractions, percentages, and mustlie between zero (zero probability) and one

    (absolute certainty).The probability of an event cannot be

    negative.

    The probability of an event can also be

    expressed as a ratio of the number oflikely outcomes to the number of possibleoutcomes.

  • 8/8/2019 1 Descriptive Studies

    18/65

    For example, if a fair coin was tossed an infinitenumber of times, heads would appear on 50%of the tosses, therefore, the probability ofheads, or p (heads), is 0.50.

    If a random sample of 10 people was drawn an

    infinite number of times from a population of100 people, each person would be included inthe sample 10% of the time; therefore, p(being included in any one sample) is 0.10.

    The probability of an event not occurring isequal to one minus the probability that it willoccur; this is denoted by q.

    In the above example, the probability of any oneperson not being included in any one sample,

    q, is therefore (1 - p) = (1 - 0.10) = 0.90.

  • 8/8/2019 1 Descriptive Studies

    19/65

    There are three main method of calculatingprobability:The ADDITION rule

    The MULTIPLICATION rule

    The BINOMIAL DISTRIBUTION

  • 8/8/2019 1 Descriptive Studies

    20/65

  • 8/8/2019 1 Descriptive Studies

    21/65

    Addition rule

    Addition-rule of probability states that theprobability of any one or several particularevents occurring is equal to the sum oftheir individual probabilities, provided the

    events are mutually exclusive; i.e., theycannot both happen.

    Because the probability of picking a heartcard from a deck of cards is 0.25, this rule

    states that the probability of picking a cardthat is either a diamond or heart is 0.25 +0.25 = 0.50. Because no card can be botha heart and diamond, these events meet

    the requirement of mutual exclusiveness.

  • 8/8/2019 1 Descriptive Studies

    22/65

    Multiplication rule

    The multiplication rule of probability statesthat the probability of two or morestatistically independent events alloccurring is equal to the product of their

    individual probabilities.If the lifetime probability of a person

    developing cancer is 0.25, and the lifetimeprobability of developing schizophrenia is

    0.01, the lifetime probability that a personmight have both cancer and schizophreniais 0.25 X0.01 = .0025, provided that thetwo illnesses are independentin other

    words, that having one illness neitherincreases nor decreases the risk of havin

  • 8/8/2019 1 Descriptive Studies

    23/65

    Binomial Distribution

    The probability that a mutually exclusiveindependent events will occur can bedetermined by the use of binomialdistribution.

    A binomial distribution is one in which thereare only two possibilities, such as yes / no,male/female, healthy/sick.

    If an experiment has exactly two outcomes,

    one of which is generally termed success,the binomial distribution gives theprobability of obtaining an exact number ofsuccesses in a series of independent

    trials.

  • 8/8/2019 1 Descriptive Studies

    24/65

    A typical use of binomial distribution is ingenetic counseling.

    Inheritance of a disorder such asPhenylketonuria follows a binomial

    distribution : there are two possible events,inheriting the disease and not inheritingthe disease; and the possibilities areindependent (if a child in a family inherits

    the disorder, this does not affect thechance of another child inheriting it).

  • 8/8/2019 1 Descriptive Studies

    25/65

    A physician could therefore use thebinomial distribution to inform the couplewho are the carrier of the disease howprobable it is that some specific

    combination of events might occur- suchas the probability that if they are to havetwo children , neither will inherit thedisease.

  • 8/8/2019 1 Descriptive Studies

    26/65

    Types of Data

  • 8/8/2019 1 Descriptive Studies

    27/65

    Types of Data

    The choice of an appropriate statisticaltechnique depends upon the type of datain question.

    Data forms one of the four scales of

    measurement:Nominal

    Ordinal

    Interval

    Ratio

  • 8/8/2019 1 Descriptive Studies

    28/65

    Nominal scale data

    Nominal scale data are divided intoqualitative categories or groups such asmale/female, urban/rural, or red/green.

    There is no implication of order or ratio.

    Nominal data that fall under only two groupsare called dichotomous data.

  • 8/8/2019 1 Descriptive Studies

    29/65

    Ordinal scale data

    Ordinal scale data can be placed inmeaningful order; e.g. ranking of students.

    However, there is no information about thesize of the interval; no conclusion can be

    drawn about whether the differencebetween the first and second students issame as that between second and third.

  • 8/8/2019 1 Descriptive Studies

    30/65

    Interval scale data

    They are like ordinal data in that they can beplaced in a meaningful order.

    In addition, they have meaningful intervalsbetween items, which are usually

    measured quantities. E.g. temperaturescale.

    However, because interval scales do nothave an absolute zero, ratios of scores are

    not meaningful. E.g. 100 C is not twice ashot as 50 C.

  • 8/8/2019 1 Descriptive Studies

    31/65

    Ratio scale data

    A ratio scale has the same properties asinterval scale, however meaningful ratiosexist as there is an absolute zero.

    Most biomedical variables form a ratio scale:

    weights in pounds, time in seconds ordays, blood pressure in mm of Hg, pulserate in beats per minute are all ratio data.

    A pulse rate of zero indicates absolute lack

    of pulse. Therefore it is correct to say thata pulse rate of 120 BPM is twice that of 60BPM.

  • 8/8/2019 1 Descriptive Studies

    32/65

    Discrete variables

    Discrete variables can take only certainvalues and nothing in between.

    For example, the number of patients in ahospital census may be 200 or 220, but it

    cannot be in between these two; thenumber of syringes used in a clinic on anygiven day may increase or decrease onlyby units of one.

  • 8/8/2019 1 Descriptive Studies

    33/65

    Continuous variables

    Continuous variables may take any value(typically between certain limits).

    Most biomedical variables are continuous(e.g., a patient's weight, height, age, and

    blood pressure).However, the process of measuring or

    reporting continuous variables will reducethem to a discrete variable.

    Blood pressure may be reported to thenearest whole millimeter of mercury,weight to the nearest pound, and age tothe nearest year.

  • 8/8/2019 1 Descriptive Studies

    34/65

    FREQUENCY DISTRIBUTIONS

  • 8/8/2019 1 Descriptive Studies

    35/65

    A set of unorganized data is difficult to digestand understand.

    Consider a study of the serum cholesterol levelsof a sample of 200 men: a list of the 200 levels

    would be of little value in itself.A simple first way of organizing the data is to list

    all the possible values between the highestand the lowest in order, recording thefrequency (f) with which each score occurs.

    This forms a frequency distribution.

    If the highest serum cholesterol level were 260mg/dl, and the lowest were 161 mg/dl, thefrequency distribution would be:

  • 8/8/2019 1 Descriptive Studies

    36/65

    G d f

  • 8/8/2019 1 Descriptive Studies

    37/65

    Grouped frequencydistributions

    Data can be made more manageable bycreating a grouped frequency distribution.

    Individual scores are grouped (between 5and 20 groups are usually appropriate).

    Each group of scores encompasses an equalclass interval.

    In this example there are 10 groups with aclass interval of 10 (161 to 170, 171 to

    180, and so on.

  • 8/8/2019 1 Descriptive Studies

    38/65

    Interval requency f e la t i v e f% e l f

    u m u l at iv e f% u m f

    251-260 5 2.5 100.0

    241-250 13 6.5 97.5

    231-240 19 9.5 91.0

    221-230 18 9.0 81.5

    211-220 38 19.0 72.5

    201-210 72 36.0 53.5

    191-200 14 7.0 17.5

    181-190 12 6.0 10.5

    171-180 5 2.5 4.5

    161-170 4 2.0 2.0

    R l ti f

  • 8/8/2019 1 Descriptive Studies

    39/65

    Relative frequencydistributions

    A grouped frequency distribution can betransformed into a relative frequencydistribution, which shows the percentage of allthe elements that fall within each classinterval.

    The relative frequency of elements in any givenclass interval is found by dividing f, thefrequency (or number of elements) in thatclass interval, by n (the sample size, which in

    this case is 200).By multiplying the result by 100, it is converted

    into a percentage.

    Thus, this distribution shows, for example, that

    19% of this sample had serum cholesterol

  • 8/8/2019 1 Descriptive Studies

    40/65

    Interval requency f e la t i v e f% e l f

    u m u l at iv e f% u m f

    251-260 5 2.5 100.0

    241-250 13 6.5 97.5

    231-240 19 9.5 91.0

    221-230 18 9.0 81.5

    211-220 38 19.0 72.5

    201-210 72 36.0 53.5

    191-200 14 7.0 17.5

    181-190 12 6.0 10.5

    171-180 5 2.5 4.5

    161-170 4 2.0 2.0

    C l ti f

  • 8/8/2019 1 Descriptive Studies

    41/65

    Cumulative frequencydistributions

    This is also expressed as a percentage; itshows the percentage of elements lyingwithin and below each class interval.

    Although a group may be called the 211-220group, this group actually includes therange of scores that lie from 210.5 up toand including 220.5so these figures arethe exact upper and lower limits of thegroup.

    The relative frequency column shows that2% of the distribution lies in the 161-170group and 2.5% lies in the 171-180 group;therefore, a total of 4.5% of thedistribution lies at or below a score of180.5, as shown by the cumulative

  • 8/8/2019 1 Descriptive Studies

    42/65

    A further 6% of the distribution lies in the181-190 group; therefore, a total of (2 +2.5 + 6) = 10.5% lies at or below a scoreof 190.5.

    A man with a serum cholesterol level of 190mg/dl can be told that roughly 10% of thissample had lower levels than his, andapproximately 90% had scores above his.

    The cumulative frequency of the highestgroup (251-260) must be 100, showingthat 100% of the distribution lies at orbelow a score of 260.5.

  • 8/8/2019 1 Descriptive Studies

    43/65

    Interval requency f e la t i v e f% e l f

    u m u l at iv e f% u m f

    251-260 5 2.5 100.0

    241-250 13 6.5 97.5

    231-240 19 9.5 91.0

    221-230 18 9.0 81.5

    211-220 38 19.0 72.5

    201-210 72 36.0 53.5

    191-200 14 7.0 17.5

    181-190 12 6.0 10.5

    171-180 5 2.5 4.5

    161-170 4 2.0 2.0

  • 8/8/2019 1 Descriptive Studies

    44/65

    Presentation of StatisticalData

  • 8/8/2019 1 Descriptive Studies

    45/65

    Statistical data, once collected, must bearranged purposively, in order to bring outthe important points clearly and strikingly.

    Therefore the manner in which statistical

    data is presented is of utmost importance.There are several methods of presenting

    data - tables, charts, diagrams, graphs,pictures and special curves.

  • 8/8/2019 1 Descriptive Studies

    46/65

    Methods of presenting data

    TablesDiagrams

    Bar Charts

    Histogram

    Frequency polygonPie charts

    Pictogram

  • 8/8/2019 1 Descriptive Studies

    47/65

    Bar charts

    To display nominal scale data, a bar graph istypically used. For example, if a group of100 men had a mean serum cholesterolvalue of 212 mg/dl, and a group of 100

    women had a mean value of 185 mg/dl,the means of these two groups could bepresented as a bar graph.

    Bar graphs are identical to frequency

    histograms, except that each rectangle onthe graph is clearly separated from theothers by a space, showing that the dataform separate categories (such as maleand female) rather than continuous

    rou s.

  • 8/8/2019 1 Descriptive Studies

    48/65

    Bar chart

  • 8/8/2019 1 Descriptive Studies

    49/65

  • 8/8/2019 1 Descriptive Studies

    50/65

    Interval requency f e la t i v e f% e l f u m u l a t i v e f% u m f251-260 5 2.5 100.0

    241-250 13 6.5 97.5

    231-240 19 9.5 91.0

    221-230 18 9.0 81.5

    211-220 38 19.0 72.5

    201-210 72 36.0 53.5

    191-200 14 7.0 17.5

    181-190 12 6.0 10.5

    171-180 5 2.5 4.5

    161-170 4 2.0 2.0

  • 8/8/2019 1 Descriptive Studies

    51/65

    Histogram

  • 8/8/2019 1 Descriptive Studies

    52/65

    Frequency polygon

    For ratio or interval scale data, a frequencydistribution may be drawn as a frequencypolygon, in which the midpoints of eachclass interval are joined by straight lines.

  • 8/8/2019 1 Descriptive Studies

    53/65

  • 8/8/2019 1 Descriptive Studies

    54/65

    A cumulative frequency distribution can alsobe presented graphically as a polygon.

    Cumulative frequency polygons typicallyform a characteristic S-shaped curve

    known as an ogive.

  • 8/8/2019 1 Descriptive Studies

    55/65

  • 8/8/2019 1 Descriptive Studies

    56/65

    Pie chart

    Instead of comparing the length of a bar, theareas of segments of a circle arecompared.

    The area of each segment depends upon the

    angle.It is often necessary to indicate the

    percentages in the segments as it may notbe easy to compare the areas of segments.

  • 8/8/2019 1 Descriptive Studies

    57/65

    Pie chart

  • 8/8/2019 1 Descriptive Studies

    58/65

    Pictogram

    Pictograms are a popular method ofpresenting data to the layman.

    Small pictures or symbols are used topresent the data.

    For example, a picture of doctor to represent& population per physician .

    Fractions of the picture can be used torepresent numbers smaller than the valueof a whole symbol.

  • 8/8/2019 1 Descriptive Studies

    59/65

  • 8/8/2019 1 Descriptive Studies

    60/65

    Centiles and other quantiles

    The cumulative frequency polygon and thecumulative frequency distribution bothillustrate the concept of centile (orpercentile) rank, which states the

    percentage of observations that fall belowany particular score.

    In the case of a grouped frequencydistribution, centile ranks state the

    percentage of observations that fall withinor below any given class interval.

    Centile ranks provide a way of givinginformation about one individual score in

    relation to all the other scores in a

  • 8/8/2019 1 Descriptive Studies

    61/65

    Interval requency f e la t i v e f% e l f

    u m u l at iv e f% u m f

    251-260 5 2.5 100.0

    241-250 13 6.5 97.5

    231-240 19 9.5 91.0

    221-230 18 9.0 81.5

    211-220 38 19.0 72.5

    201-210 72 36.0 53.5

    191-200 14 7.0 17.5

    181-190 12 6.0 10.5

    171-180 5 2.5 4.5

    161-170 4 2.0 2.0

  • 8/8/2019 1 Descriptive Studies

    62/65

    For example, the cumulative frequencycolumn of above table shows that 91% ofthe observations fall below 240.5 mg/dl,which therefore represents the 91st centile

    (which can be written as C91 ).A man with a serum cholesterol level of 240

    mg/dl lies at the 91st centile-about 9% ofthe scores in the sample are higher than

    his.

  • 8/8/2019 1 Descriptive Studies

    63/65

  • 8/8/2019 1 Descriptive Studies

    64/65

    Centile ranks are widely used in reportingscores on educational tests.

    They are one member of a family of valuescalled quantiles, which divide distributions

    into a number of equal parts.Centiles divide a distribution into 100 equal

    parts.

    Other quantiles include quartiles, which

    divide the data into 4 parts, and deciles,which divide a distribution into 10 parts.

  • 8/8/2019 1 Descriptive Studies

    65/65