Introduction to biostatistics Lecture plan
description
Transcript of Introduction to biostatistics Lecture plan
11
Introduction to Introduction to biostatisticsbiostatisticsLecture planLecture plan
1.1. BasicsBasics2.2. Variable typesVariable types3.3. Descriptive statisticsDescriptive statistics::
Categorical dataCategorical data Numerical dataNumerical data
4.4. IInferential statisticsnferential statistics Confidence Confidence intervalintervalss HipotHipotheses testingheses testing
22
DEFINITIONSDEFINITIONSSTATISTISTATISTICSCS can mean can mean 2 things:2 things:- the numbers we get when we measure and - the numbers we get when we measure and count things (data)count things (data)- a collection of procedures for describing and - a collection of procedures for describing and anlysing data.anlysing data.
BIOSTATISTIBIOSTATISTICSCS – – application of statistics application of statistics in nature sciences, when biomedical and in nature sciences, when biomedical and problems are analysed.problems are analysed.
33
Why do we need statistics?Why do we need statistics?
????
44
Basic parts of Basic parts of statististatisticcs:s:
DescriptiveDescriptive IInferentialnferential
55
TerminologyTerminology
Population Sample
Variables
66
Variable typesVariable types
Categorical Categorical ((qualitativequalitative))
Numerical Numerical ((quantitativequantitative))
CombinedCombined
77
Categorical dataCategorical dataNominalNominal
2 categories2 categories >2 categories>2 categories
OrdinalOrdinal
88
Numerical dataNumerical data
ContinuousContinuous DisDiscretecrete
99
Description of categorical Description of categorical datadata
Arranging dataArranging data Frequencies, tablesFrequencies, tables Visualization (graphical Visualization (graphical
presentation)presentation)
1010
Frequencies and Frequencies and contingency tablescontingency tables
From those From those who were who were unsatisfied 4 unsatisfied 4 were males, were males, 6 were 6 were females.females.
TotalTotal MalesMales FemalesFemales
SatisfiedSatisfied 4040
80%80%1414
77,877,8%%
2626
81,3%81,3%
UnsatisfiedUnsatisfied 1010
20 %20 %44
22,222,2%%
66
18,7%18,7%
TotalTotal 5050
100%100%1818
100%100%3232
100%100%
1111
GraGraphical presentationphical presentation
Lyčių struktūra Lietuvoje 1993 m.
vyrų
moterų
Lyčių struktūra Lietuvoje 1991 m.
vyrų
moterų
1212
GraGraphical presentationphical presentation
Lyčių struktūra Lietuvoje
44%45%46%47%48%49%50%51%52%53%54%
1993 m. 1996 m.
vyrų
moterų
1313
GraGraphical presentationphical presentationLyčių struktūra Lietuvoje
0%
20%
40%
60%
80%
100%
120%
1993 m. 1996 m.
moterų
vyrų
1414
GraGraphical presentationphical presentation
0%
20%
40%
60%
80%
100%
Kro
atija
Danija
Švedija
Suom
ija
Pra
ncūzija
Airija
Norv
egija
Rusija
Slo
vakija
Slo
venija
Lie
tuva
J01A Tetraciklinai J01C Penicilinai
J01D Kiti β-laktaminiai antibiotikai J01E Sulfonamidai ir trimetoprimas
J01F Makrolidai, linkozamidai, streptograminai J01M Chinolonai
J01X Kiti
1515
GraGraphical presentationphical presentation
•OtherOther::- Maps- Maps- - Chernoff facesChernoff faces- - Star plotStar plots, etcs, etc..
1616
Description of numerical Description of numerical datadata
Arranging dataArranging data Frequencies (relative and cumulative), Frequencies (relative and cumulative),
graphical presentationgraphical presentation Measures of central tendency and Measures of central tendency and
variancevariance Assessing normalityAssessing normality
1717
GroupingGrouping
Sorting dataSorting data GrGrooupups (5-17 gr.) according s (5-17 gr.) according
researcher’s criteria.researcher’s criteria.
To assess distribution, for graphical presentation in excelTo assess distribution, for graphical presentation in excel
1818
Frequencies, their comparison Frequencies, their comparison and calculationand calculation
197 students were asked about the amount of money (litas) they had in cash at the moment.
Frequency Cumulative frequencynumber of litas n % n %
1 1 0,5 1 0,52 2 1,0 1+2=3 1,53 4 2,0 3+4=7 3,64 8 4,1 7+8=15 7,65 15 7,6 15+15=30 15,26 24 12,2 30+24=54 27,47 29 14,7 54+29=83 42,18 31 15,783+31=114 57,99 29 14,7114+29=143 72,6
10 24 12,2143+24=167 84,811 15 7,6167+15=182 92,412 8 4,1182+8=190 96,413 4 2,0190+4=194 98,514 2 1,0194+2=196 99,515 1 0,5196+1=197 100,0
Total 197 100,0
1919
Gaphical presentation of Gaphical presentation of frequenciesfrequencies
2020
NormalNormal distributions distributions Most of them around centerMost of them around center Less above and lower central Less above and lower central
values, approximately the values, approximately the same proportionssame proportions
Most often Gaussian Most often Gaussian distributiondistribution
2121
Not normal distributionsNot normal distributions
More observations in one part.More observations in one part.
2222Asymmetrical distribution
2323
How would you How would you describe/present your describe/present your
respondents if the data are respondents if the data are numeric?numeric?
2 groups of measures2 groups of measures::
1.1. Central tendency (central Central tendency (central value, average)value, average)
2.2. VarianceVariance
2424
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
Means/averages (arithmetic, Means/averages (arithmetic, geometric, harmonic, etc.)geometric, harmonic, etc.)
ModeMode MedianMedian QuartilesQuartiles
2525
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
AritArithhmetimetic meanc mean (X, (X, μμ))
2626
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
MedianMedian (Me) – (Me) – the middle value or 5the middle value or 500thth procentilprocentilee ( (the value of the observationthe value of the observation, , that divides the sorted datathat divides the sorted data in almost in almost equal parts)equal parts)..It is found this wayIt is found this way
When When n n oddodd: median: median is the middle observation is the middle observationWhen When n n eveneven: median: median is the average of values is the average of values of two middle observationsof two middle observations
2
1n
2727
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
ModModee (Mo) – (Mo) – the most common the most common valuesvalues Can be more than one modeCan be more than one mode
2828
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
Quartiles Quartiles (Q(Q11, , QQ22, , QQ33, , QQ44) ) – – sample sample size is divided into 4 equal parts size is divided into 4 equal parts getting 25% of observations in each getting 25% of observations in each of them.of them.
2929
Is it enough measure of Is it enough measure of central tendency to central tendency to
describe respondents?describe respondents?
3030
MEASURES OF VARIANCEMEASURES OF VARIANCE
Min and maxMin and max RangeRange StandarStandard deviationd deviation – – sqrt of sqrt of
variance (SD)variance (SD) VarianceVariance - V= - V= ∑∑(x(xii - x) - x)22/n-1/n-1
InterInterquartile range quartile range (Q(Q3-Q1 or 3-Q1 or 75%-25%) IQRT75%-25%) IQRT
3131
What measures are to be used for What measures are to be used for sample description?sample description?
If distribution is NORMALIf distribution is NORMAL MeanMean Variance Variance ((oror standarstandard deviationd deviation))
If distribution is NOT NORMALIf distribution is NOT NORMAL MedianMedian IQRT or min/maxIQRT or min/max
Those measures are used also with numeric ordinal dataThose measures are used also with numeric ordinal data
3232
X, Mo, Me
Mean~Mean~MedianMedian~~ModModee,,SD ir SD ir empyric ruleempyric rule
3333
EMPEMPYRICAL RULEYRICAL RULE
Number of observationsNumber of observations (%) 1, 2 ir (%) 1, 2 ir 2.5 SD 2.5 SD from mean if distribution is from mean if distribution is normalnormal
3434
Example
X-2SD +2SD
X=8
SD=2,5
3535
Normality assessmentNormality assessmentSummarySummary
GraphicalGraphical Comparison of measures of central Comparison of measures of central
tendency; empyrical rule (mean and tendency; empyrical rule (mean and standard deviation)standard deviation)
SSkewnesskewness and and kurtosis kurtosis ((if Gaussian if Gaussian =0)=0)
KolmogorovKolmogorov--Smirnov testSmirnov test
MedianMean( *)
75th Procentile
25th Procentile
75th Procentile
25th Procentile
Outliers
BoxplotBoxplot
Boxplot exampleBoxplot example
44014,00
15,33
16,67
18,00
19,33
20,67
22,00
23,33
24,67
26,00
Central limit theoremCentral limit theorem
3939
Inferential Inferential statististatisticscs
Confidence Confidence intervalintervalss HipotHipothheesesses testingtesting
4040
Confidence Confidence intervalintervalss
Interval Interval where the “true” value where the “true” value most likely could occur.most likely could occur.
4141
The variance of samples The variance of samples and their measuresand their measures
μ, σ, p0
X1, SD1; p1
X2, SD2; p2X3, SD3; p3
X4; SD4; p4
X
4242
The variance of samples and The variance of samples and confidence confidence intervalintervalss
μ, p0
4343
Confidence intervalConfidence interval
Statistical definition:Statistical definition:
If the study was carried out 100 times, If the study was carried out 100 times, 100 100 reresultssults ir ir 100 C100 CII were got, 95 were got, 95 times of 100times of 100 the the “true” value will be in that interval. But it will “true” value will be in that interval. But it will not appear in that interval 5 times of 100.not appear in that interval 5 times of 100.
4444
Confidence Confidence intervalintervalss((generalgeneral, , most common most common
calculationcalculation))
95% CI 95% CI :: X X ±± 1.96 1.96 SE SE XXminmin;; X Xmaxmax
Note: for normal distribution, when n is largeNote: for normal distribution, when n is large
95% CI 95% CI :: pp ±± 1.96 1.96 SESE ppminmin ;; p pmaxmax
Note: whenNote: when p ir p ir 1-p > 5/n1-p > 5/n
4545
StandarStandard errord error (SE) (SE)
Numeric dataNumeric data
((X X ))Categorical dataCategorical data
(p)(p)
4646
Width of confidence inervalWidth of confidence inerval
depends ondepends on::
a)a) Sample sizeSample size;;
b)b) Confidence levelConfidence level ( (guaranty - usually 95%, guaranty - usually 95%, but available any %)but available any %);;
c)c) dispersiondispersion..
4747
HipotHipotheses testingheses testing
HH00: : μμ11==μμ22; p; p11=p=p22; (RR=1, OR=1, ; (RR=1, OR=1, differencedifference=0)=0)
HHAA: : μμ11≠≠μμ22; p; p11≠p≠p22 (two sided, one (two sided, one sided)sided)
4848
Significance level Significance level αα (agreed (agreed 0 0..005).5).
TesTestt for for P P valuevalue (t-test, (t-test, χχ22 , etc, etc..).).
P P value is the probability to get the value is the probability to get the difference (association)difference (association),, if the null if the null hypothesis is truehypothesis is true..
OROR P P value is the probability to get the difference value is the probability to get the difference (association) due to chance alone, when the null (association) due to chance alone, when the null hypothesis is truehypothesis is true..
HipotHipotheses testingheses testing
4949
Statistical agreementsStatistical agreements
If If P<0P<0.05, we say, that results can’t .05, we say, that results can’t be explained by chance alone, be explained by chance alone, therefore we reject Htherefore we reject H00 and accept Hand accept HAA..
If If PP≥≥00.05, we say.05, we say, , that found that found difference can be due to chance difference can be due to chance alone, therefore we don’t reject Halone, therefore we don’t reject H0.0.
5050
TestTestssTest depends onTest depends on
Study designStudy design,, Variable typeVariable type distribution,distribution, Number of groups, etc.Number of groups, etc.
Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.
5151
P value tells, if there is statistically P value tells, if there is statistically significant difference (association).significant difference (association).
CI gives interval where true value can CI gives interval where true value can be.be.
Inferential statisticsInferential statisticsSummarySummary
5252
Inferential statisticsInferential statisticsSummarySummary
Neither P value, nor CNeither P value, nor CI I give other give other explanations of the result (bias and explanations of the result (bias and confounding). confounding).
Neither P value, nor CNeither P value, nor CI I tell anything tell anything about the biological, clinical or public about the biological, clinical or public health meaning of the resultshealth meaning of the results..