Introduction to biostatistics Lecture plan

52
1 Introduction to Introduction to biostatistics biostatistics Lecture plan Lecture plan 1. 1. Basics Basics 2. 2. Variable types Variable types 3. 3. Descriptive statistics Descriptive statistics : : Categorical data Categorical data Numerical data Numerical data 4. 4. I I nferential statistics nferential statistics Confidence Confidence interval interval s s Hipot Hipot heses testing heses testing

description

Introduction to biostatistics Lecture plan. Basics Variable types Descriptive statistics : Categorical data Numerical data I nferential statistics Confidence interval s Hipot heses testing. DEFINITIONS. - PowerPoint PPT Presentation

Transcript of Introduction to biostatistics Lecture plan

Page 1: Introduction to biostatistics Lecture plan

11

Introduction to Introduction to biostatisticsbiostatisticsLecture planLecture plan

1.1. BasicsBasics2.2. Variable typesVariable types3.3. Descriptive statisticsDescriptive statistics::

Categorical dataCategorical data Numerical dataNumerical data

4.4. IInferential statisticsnferential statistics Confidence Confidence intervalintervalss HipotHipotheses testingheses testing

Page 2: Introduction to biostatistics Lecture plan

22

DEFINITIONSDEFINITIONSSTATISTISTATISTICSCS can mean can mean 2 things:2 things:- the numbers we get when we measure and - the numbers we get when we measure and count things (data)count things (data)- a collection of procedures for describing and - a collection of procedures for describing and anlysing data.anlysing data.

BIOSTATISTIBIOSTATISTICSCS – – application of statistics application of statistics in nature sciences, when biomedical and in nature sciences, when biomedical and problems are analysed.problems are analysed.

Page 3: Introduction to biostatistics Lecture plan

33

Why do we need statistics?Why do we need statistics?

????

Page 4: Introduction to biostatistics Lecture plan

44

Basic parts of Basic parts of statististatisticcs:s:

DescriptiveDescriptive IInferentialnferential

Page 5: Introduction to biostatistics Lecture plan

55

TerminologyTerminology

Population Sample

Variables

Page 6: Introduction to biostatistics Lecture plan

66

Variable typesVariable types

Categorical Categorical ((qualitativequalitative))

Numerical Numerical ((quantitativequantitative))

CombinedCombined

Page 7: Introduction to biostatistics Lecture plan

77

Categorical dataCategorical dataNominalNominal

2 categories2 categories >2 categories>2 categories

OrdinalOrdinal

Page 8: Introduction to biostatistics Lecture plan

88

Numerical dataNumerical data

ContinuousContinuous DisDiscretecrete

Page 9: Introduction to biostatistics Lecture plan

99

Description of categorical Description of categorical datadata

Arranging dataArranging data Frequencies, tablesFrequencies, tables Visualization (graphical Visualization (graphical

presentation)presentation)

Page 10: Introduction to biostatistics Lecture plan

1010

Frequencies and Frequencies and contingency tablescontingency tables

From those From those who were who were unsatisfied 4 unsatisfied 4 were males, were males, 6 were 6 were females.females.

TotalTotal MalesMales FemalesFemales

SatisfiedSatisfied 4040

80%80%1414

77,877,8%%

2626

81,3%81,3%

UnsatisfiedUnsatisfied 1010

20 %20 %44

22,222,2%%

66

18,7%18,7%

TotalTotal 5050

100%100%1818

100%100%3232

100%100%

Page 11: Introduction to biostatistics Lecture plan

1111

GraGraphical presentationphical presentation

Lyčių struktūra Lietuvoje 1993 m.

vyrų

moterų

Lyčių struktūra Lietuvoje 1991 m.

vyrų

moterų

Page 12: Introduction to biostatistics Lecture plan

1212

GraGraphical presentationphical presentation

Lyčių struktūra Lietuvoje

44%45%46%47%48%49%50%51%52%53%54%

1993 m. 1996 m.

vyrų

moterų

Page 13: Introduction to biostatistics Lecture plan

1313

GraGraphical presentationphical presentationLyčių struktūra Lietuvoje

0%

20%

40%

60%

80%

100%

120%

1993 m. 1996 m.

moterų

vyrų

Page 14: Introduction to biostatistics Lecture plan

1414

GraGraphical presentationphical presentation

0%

20%

40%

60%

80%

100%

Kro

atija

Danija

Švedija

Suom

ija

Pra

ncūzija

Airija

Norv

egija

Rusija

Slo

vakija

Slo

venija

Lie

tuva

J01A Tetraciklinai J01C Penicilinai

J01D Kiti β-laktaminiai antibiotikai J01E Sulfonamidai ir trimetoprimas

J01F Makrolidai, linkozamidai, streptograminai J01M Chinolonai

J01X Kiti

Page 15: Introduction to biostatistics Lecture plan

1515

GraGraphical presentationphical presentation

•OtherOther::- Maps- Maps- - Chernoff facesChernoff faces- - Star plotStar plots, etcs, etc..

Page 16: Introduction to biostatistics Lecture plan

1616

Description of numerical Description of numerical datadata

Arranging dataArranging data Frequencies (relative and cumulative), Frequencies (relative and cumulative),

graphical presentationgraphical presentation Measures of central tendency and Measures of central tendency and

variancevariance Assessing normalityAssessing normality

Page 17: Introduction to biostatistics Lecture plan

1717

GroupingGrouping

Sorting dataSorting data GrGrooupups (5-17 gr.) according s (5-17 gr.) according

researcher’s criteria.researcher’s criteria.

To assess distribution, for graphical presentation in excelTo assess distribution, for graphical presentation in excel

Page 18: Introduction to biostatistics Lecture plan

1818

Frequencies, their comparison Frequencies, their comparison and calculationand calculation

197 students were asked about the amount of money (litas) they had in cash at the moment.

Frequency Cumulative frequencynumber of litas n % n %

1 1 0,5 1 0,52 2 1,0 1+2=3 1,53 4 2,0 3+4=7 3,64 8 4,1 7+8=15 7,65 15 7,6 15+15=30 15,26 24 12,2 30+24=54 27,47 29 14,7 54+29=83 42,18 31 15,783+31=114 57,99 29 14,7114+29=143 72,6

10 24 12,2143+24=167 84,811 15 7,6167+15=182 92,412 8 4,1182+8=190 96,413 4 2,0190+4=194 98,514 2 1,0194+2=196 99,515 1 0,5196+1=197 100,0

Total 197 100,0

Page 19: Introduction to biostatistics Lecture plan

1919

Gaphical presentation of Gaphical presentation of frequenciesfrequencies

Page 20: Introduction to biostatistics Lecture plan

2020

NormalNormal distributions distributions Most of them around centerMost of them around center Less above and lower central Less above and lower central

values, approximately the values, approximately the same proportionssame proportions

Most often Gaussian Most often Gaussian distributiondistribution

Page 21: Introduction to biostatistics Lecture plan

2121

Not normal distributionsNot normal distributions

More observations in one part.More observations in one part.

Page 22: Introduction to biostatistics Lecture plan

2222Asymmetrical distribution

Page 23: Introduction to biostatistics Lecture plan

2323

How would you How would you describe/present your describe/present your

respondents if the data are respondents if the data are numeric?numeric?

2 groups of measures2 groups of measures::

1.1. Central tendency (central Central tendency (central value, average)value, average)

2.2. VarianceVariance

Page 24: Introduction to biostatistics Lecture plan

2424

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

Means/averages (arithmetic, Means/averages (arithmetic, geometric, harmonic, etc.)geometric, harmonic, etc.)

ModeMode MedianMedian QuartilesQuartiles

Page 25: Introduction to biostatistics Lecture plan

2525

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

AritArithhmetimetic meanc mean (X, (X, μμ))

Page 26: Introduction to biostatistics Lecture plan

2626

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

MedianMedian (Me) – (Me) – the middle value or 5the middle value or 500thth procentilprocentilee ( (the value of the observationthe value of the observation, , that divides the sorted datathat divides the sorted data in almost in almost equal parts)equal parts)..It is found this wayIt is found this way

When When n n oddodd: median: median is the middle observation is the middle observationWhen When n n eveneven: median: median is the average of values is the average of values of two middle observationsof two middle observations

2

1n

Page 27: Introduction to biostatistics Lecture plan

2727

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

ModModee (Mo) – (Mo) – the most common the most common valuesvalues Can be more than one modeCan be more than one mode

Page 28: Introduction to biostatistics Lecture plan

2828

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

Quartiles Quartiles (Q(Q11, , QQ22, , QQ33, , QQ44) ) – – sample sample size is divided into 4 equal parts size is divided into 4 equal parts getting 25% of observations in each getting 25% of observations in each of them.of them.

Page 29: Introduction to biostatistics Lecture plan

2929

Is it enough measure of Is it enough measure of central tendency to central tendency to

describe respondents?describe respondents?

Page 30: Introduction to biostatistics Lecture plan

3030

MEASURES OF VARIANCEMEASURES OF VARIANCE

Min and maxMin and max RangeRange StandarStandard deviationd deviation – – sqrt of sqrt of

variance (SD)variance (SD) VarianceVariance - V= - V= ∑∑(x(xii - x) - x)22/n-1/n-1

InterInterquartile range quartile range (Q(Q3-Q1 or 3-Q1 or 75%-25%) IQRT75%-25%) IQRT

Page 31: Introduction to biostatistics Lecture plan

3131

What measures are to be used for What measures are to be used for sample description?sample description?

If distribution is NORMALIf distribution is NORMAL MeanMean Variance Variance ((oror standarstandard deviationd deviation))

If distribution is NOT NORMALIf distribution is NOT NORMAL MedianMedian IQRT or min/maxIQRT or min/max

Those measures are used also with numeric ordinal dataThose measures are used also with numeric ordinal data

Page 32: Introduction to biostatistics Lecture plan

3232

X, Mo, Me

Mean~Mean~MedianMedian~~ModModee,,SD ir SD ir empyric ruleempyric rule

Page 33: Introduction to biostatistics Lecture plan

3333

EMPEMPYRICAL RULEYRICAL RULE

Number of observationsNumber of observations (%) 1, 2 ir (%) 1, 2 ir 2.5 SD 2.5 SD from mean if distribution is from mean if distribution is normalnormal

Page 34: Introduction to biostatistics Lecture plan

3434

Example

X-2SD +2SD

X=8

SD=2,5

Page 35: Introduction to biostatistics Lecture plan

3535

Normality assessmentNormality assessmentSummarySummary

GraphicalGraphical Comparison of measures of central Comparison of measures of central

tendency; empyrical rule (mean and tendency; empyrical rule (mean and standard deviation)standard deviation)

SSkewnesskewness and and kurtosis kurtosis ((if Gaussian if Gaussian =0)=0)

KolmogorovKolmogorov--Smirnov testSmirnov test

Page 36: Introduction to biostatistics Lecture plan

MedianMean( *)

75th Procentile

25th Procentile

75th Procentile

25th Procentile

Outliers

BoxplotBoxplot

Page 37: Introduction to biostatistics Lecture plan

Boxplot exampleBoxplot example

44014,00

15,33

16,67

18,00

19,33

20,67

22,00

23,33

24,67

26,00

Page 38: Introduction to biostatistics Lecture plan

Central limit theoremCentral limit theorem

Page 39: Introduction to biostatistics Lecture plan

3939

Inferential Inferential statististatisticscs

Confidence Confidence intervalintervalss HipotHipothheesesses testingtesting

Page 40: Introduction to biostatistics Lecture plan

4040

Confidence Confidence intervalintervalss

Interval Interval where the “true” value where the “true” value most likely could occur.most likely could occur.

Page 41: Introduction to biostatistics Lecture plan

4141

The variance of samples The variance of samples and their measuresand their measures

μ, σ, p0

X1, SD1; p1

X2, SD2; p2X3, SD3; p3

X4; SD4; p4

X

Page 42: Introduction to biostatistics Lecture plan

4242

The variance of samples and The variance of samples and confidence confidence intervalintervalss

μ, p0

Page 43: Introduction to biostatistics Lecture plan

4343

Confidence intervalConfidence interval

Statistical definition:Statistical definition:

If the study was carried out 100 times, If the study was carried out 100 times, 100 100 reresultssults ir ir 100 C100 CII were got, 95 were got, 95 times of 100times of 100 the the “true” value will be in that interval. But it will “true” value will be in that interval. But it will not appear in that interval 5 times of 100.not appear in that interval 5 times of 100.

Page 44: Introduction to biostatistics Lecture plan

4444

Confidence Confidence intervalintervalss((generalgeneral, , most common most common

calculationcalculation))

95% CI 95% CI :: X X ±± 1.96 1.96 SE SE XXminmin;; X Xmaxmax

Note: for normal distribution, when n is largeNote: for normal distribution, when n is large

95% CI 95% CI :: pp ±± 1.96 1.96 SESE ppminmin ;; p pmaxmax

Note: whenNote: when p ir p ir 1-p > 5/n1-p > 5/n

Page 45: Introduction to biostatistics Lecture plan

4545

StandarStandard errord error (SE) (SE)

Numeric dataNumeric data

((X X ))Categorical dataCategorical data

(p)(p)

Page 46: Introduction to biostatistics Lecture plan

4646

Width of confidence inervalWidth of confidence inerval

depends ondepends on::

a)a) Sample sizeSample size;;

b)b) Confidence levelConfidence level ( (guaranty - usually 95%, guaranty - usually 95%, but available any %)but available any %);;

c)c) dispersiondispersion..

Page 47: Introduction to biostatistics Lecture plan

4747

HipotHipotheses testingheses testing

HH00: : μμ11==μμ22; p; p11=p=p22; (RR=1, OR=1, ; (RR=1, OR=1, differencedifference=0)=0)

HHAA: : μμ11≠≠μμ22; p; p11≠p≠p22 (two sided, one (two sided, one sided)sided)

Page 48: Introduction to biostatistics Lecture plan

4848

Significance level Significance level αα (agreed (agreed 0 0..005).5).

TesTestt for for P P valuevalue (t-test, (t-test, χχ22 , etc, etc..).).

P P value is the probability to get the value is the probability to get the difference (association)difference (association),, if the null if the null hypothesis is truehypothesis is true..

OROR P P value is the probability to get the difference value is the probability to get the difference (association) due to chance alone, when the null (association) due to chance alone, when the null hypothesis is truehypothesis is true..

HipotHipotheses testingheses testing

Page 49: Introduction to biostatistics Lecture plan

4949

Statistical agreementsStatistical agreements

If If P<0P<0.05, we say, that results can’t .05, we say, that results can’t be explained by chance alone, be explained by chance alone, therefore we reject Htherefore we reject H00 and accept Hand accept HAA..

If If PP≥≥00.05, we say.05, we say, , that found that found difference can be due to chance difference can be due to chance alone, therefore we don’t reject Halone, therefore we don’t reject H0.0.

Page 50: Introduction to biostatistics Lecture plan

5050

TestTestssTest depends onTest depends on

Study designStudy design,, Variable typeVariable type distribution,distribution, Number of groups, etc.Number of groups, etc.

Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.

Page 51: Introduction to biostatistics Lecture plan

5151

P value tells, if there is statistically P value tells, if there is statistically significant difference (association).significant difference (association).

CI gives interval where true value can CI gives interval where true value can be.be.

Inferential statisticsInferential statisticsSummarySummary

Page 52: Introduction to biostatistics Lecture plan

5252

Inferential statisticsInferential statisticsSummarySummary

Neither P value, nor CNeither P value, nor CI I give other give other explanations of the result (bias and explanations of the result (bias and confounding). confounding).

Neither P value, nor CNeither P value, nor CI I tell anything tell anything about the biological, clinical or public about the biological, clinical or public health meaning of the resultshealth meaning of the results..