Introduction to biostatistics Lecture plan

Post on 28-Jan-2016

53 views 16 download

Tags:

description

Introduction to biostatistics Lecture plan. Basics Variable types Descriptive statistics : Categorical data Numerical data I nferential statistics Confidence interval s Hipot heses testing. DEFINITIONS. - PowerPoint PPT Presentation

Transcript of Introduction to biostatistics Lecture plan

11

Introduction to Introduction to biostatisticsbiostatisticsLecture planLecture plan

1.1. BasicsBasics2.2. Variable typesVariable types3.3. Descriptive statisticsDescriptive statistics::

Categorical dataCategorical data Numerical dataNumerical data

4.4. IInferential statisticsnferential statistics Confidence Confidence intervalintervalss HipotHipotheses testingheses testing

22

DEFINITIONSDEFINITIONSSTATISTISTATISTICSCS can mean can mean 2 things:2 things:- the numbers we get when we measure and - the numbers we get when we measure and count things (data)count things (data)- a collection of procedures for describing and - a collection of procedures for describing and anlysing data.anlysing data.

BIOSTATISTIBIOSTATISTICSCS – – application of statistics application of statistics in nature sciences, when biomedical and in nature sciences, when biomedical and problems are analysed.problems are analysed.

33

Why do we need statistics?Why do we need statistics?

????

44

Basic parts of Basic parts of statististatisticcs:s:

DescriptiveDescriptive IInferentialnferential

55

TerminologyTerminology

Population Sample

Variables

66

Variable typesVariable types

Categorical Categorical ((qualitativequalitative))

Numerical Numerical ((quantitativequantitative))

CombinedCombined

77

Categorical dataCategorical dataNominalNominal

2 categories2 categories >2 categories>2 categories

OrdinalOrdinal

88

Numerical dataNumerical data

ContinuousContinuous DisDiscretecrete

99

Description of categorical Description of categorical datadata

Arranging dataArranging data Frequencies, tablesFrequencies, tables Visualization (graphical Visualization (graphical

presentation)presentation)

1010

Frequencies and Frequencies and contingency tablescontingency tables

From those From those who were who were unsatisfied 4 unsatisfied 4 were males, were males, 6 were 6 were females.females.

TotalTotal MalesMales FemalesFemales

SatisfiedSatisfied 4040

80%80%1414

77,877,8%%

2626

81,3%81,3%

UnsatisfiedUnsatisfied 1010

20 %20 %44

22,222,2%%

66

18,7%18,7%

TotalTotal 5050

100%100%1818

100%100%3232

100%100%

1111

GraGraphical presentationphical presentation

Lyčių struktūra Lietuvoje 1993 m.

vyrų

moterų

Lyčių struktūra Lietuvoje 1991 m.

vyrų

moterų

1212

GraGraphical presentationphical presentation

Lyčių struktūra Lietuvoje

44%45%46%47%48%49%50%51%52%53%54%

1993 m. 1996 m.

vyrų

moterų

1313

GraGraphical presentationphical presentationLyčių struktūra Lietuvoje

0%

20%

40%

60%

80%

100%

120%

1993 m. 1996 m.

moterų

vyrų

1414

GraGraphical presentationphical presentation

0%

20%

40%

60%

80%

100%

Kro

atija

Danija

Švedija

Suom

ija

Pra

ncūzija

Airija

Norv

egija

Rusija

Slo

vakija

Slo

venija

Lie

tuva

J01A Tetraciklinai J01C Penicilinai

J01D Kiti β-laktaminiai antibiotikai J01E Sulfonamidai ir trimetoprimas

J01F Makrolidai, linkozamidai, streptograminai J01M Chinolonai

J01X Kiti

1515

GraGraphical presentationphical presentation

•OtherOther::- Maps- Maps- - Chernoff facesChernoff faces- - Star plotStar plots, etcs, etc..

1616

Description of numerical Description of numerical datadata

Arranging dataArranging data Frequencies (relative and cumulative), Frequencies (relative and cumulative),

graphical presentationgraphical presentation Measures of central tendency and Measures of central tendency and

variancevariance Assessing normalityAssessing normality

1717

GroupingGrouping

Sorting dataSorting data GrGrooupups (5-17 gr.) according s (5-17 gr.) according

researcher’s criteria.researcher’s criteria.

To assess distribution, for graphical presentation in excelTo assess distribution, for graphical presentation in excel

1818

Frequencies, their comparison Frequencies, their comparison and calculationand calculation

197 students were asked about the amount of money (litas) they had in cash at the moment.

Frequency Cumulative frequencynumber of litas n % n %

1 1 0,5 1 0,52 2 1,0 1+2=3 1,53 4 2,0 3+4=7 3,64 8 4,1 7+8=15 7,65 15 7,6 15+15=30 15,26 24 12,2 30+24=54 27,47 29 14,7 54+29=83 42,18 31 15,783+31=114 57,99 29 14,7114+29=143 72,6

10 24 12,2143+24=167 84,811 15 7,6167+15=182 92,412 8 4,1182+8=190 96,413 4 2,0190+4=194 98,514 2 1,0194+2=196 99,515 1 0,5196+1=197 100,0

Total 197 100,0

1919

Gaphical presentation of Gaphical presentation of frequenciesfrequencies

2020

NormalNormal distributions distributions Most of them around centerMost of them around center Less above and lower central Less above and lower central

values, approximately the values, approximately the same proportionssame proportions

Most often Gaussian Most often Gaussian distributiondistribution

2121

Not normal distributionsNot normal distributions

More observations in one part.More observations in one part.

2222Asymmetrical distribution

2323

How would you How would you describe/present your describe/present your

respondents if the data are respondents if the data are numeric?numeric?

2 groups of measures2 groups of measures::

1.1. Central tendency (central Central tendency (central value, average)value, average)

2.2. VarianceVariance

2424

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

Means/averages (arithmetic, Means/averages (arithmetic, geometric, harmonic, etc.)geometric, harmonic, etc.)

ModeMode MedianMedian QuartilesQuartiles

2525

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

AritArithhmetimetic meanc mean (X, (X, μμ))

2626

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

MedianMedian (Me) – (Me) – the middle value or 5the middle value or 500thth procentilprocentilee ( (the value of the observationthe value of the observation, , that divides the sorted datathat divides the sorted data in almost in almost equal parts)equal parts)..It is found this wayIt is found this way

When When n n oddodd: median: median is the middle observation is the middle observationWhen When n n eveneven: median: median is the average of values is the average of values of two middle observationsof two middle observations

2

1n

2727

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

ModModee (Mo) – (Mo) – the most common the most common valuesvalues Can be more than one modeCan be more than one mode

2828

MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY

Quartiles Quartiles (Q(Q11, , QQ22, , QQ33, , QQ44) ) – – sample sample size is divided into 4 equal parts size is divided into 4 equal parts getting 25% of observations in each getting 25% of observations in each of them.of them.

2929

Is it enough measure of Is it enough measure of central tendency to central tendency to

describe respondents?describe respondents?

3030

MEASURES OF VARIANCEMEASURES OF VARIANCE

Min and maxMin and max RangeRange StandarStandard deviationd deviation – – sqrt of sqrt of

variance (SD)variance (SD) VarianceVariance - V= - V= ∑∑(x(xii - x) - x)22/n-1/n-1

InterInterquartile range quartile range (Q(Q3-Q1 or 3-Q1 or 75%-25%) IQRT75%-25%) IQRT

3131

What measures are to be used for What measures are to be used for sample description?sample description?

If distribution is NORMALIf distribution is NORMAL MeanMean Variance Variance ((oror standarstandard deviationd deviation))

If distribution is NOT NORMALIf distribution is NOT NORMAL MedianMedian IQRT or min/maxIQRT or min/max

Those measures are used also with numeric ordinal dataThose measures are used also with numeric ordinal data

3232

X, Mo, Me

Mean~Mean~MedianMedian~~ModModee,,SD ir SD ir empyric ruleempyric rule

3333

EMPEMPYRICAL RULEYRICAL RULE

Number of observationsNumber of observations (%) 1, 2 ir (%) 1, 2 ir 2.5 SD 2.5 SD from mean if distribution is from mean if distribution is normalnormal

3434

Example

X-2SD +2SD

X=8

SD=2,5

3535

Normality assessmentNormality assessmentSummarySummary

GraphicalGraphical Comparison of measures of central Comparison of measures of central

tendency; empyrical rule (mean and tendency; empyrical rule (mean and standard deviation)standard deviation)

SSkewnesskewness and and kurtosis kurtosis ((if Gaussian if Gaussian =0)=0)

KolmogorovKolmogorov--Smirnov testSmirnov test

MedianMean( *)

75th Procentile

25th Procentile

75th Procentile

25th Procentile

Outliers

BoxplotBoxplot

Boxplot exampleBoxplot example

44014,00

15,33

16,67

18,00

19,33

20,67

22,00

23,33

24,67

26,00

Central limit theoremCentral limit theorem

3939

Inferential Inferential statististatisticscs

Confidence Confidence intervalintervalss HipotHipothheesesses testingtesting

4040

Confidence Confidence intervalintervalss

Interval Interval where the “true” value where the “true” value most likely could occur.most likely could occur.

4141

The variance of samples The variance of samples and their measuresand their measures

μ, σ, p0

X1, SD1; p1

X2, SD2; p2X3, SD3; p3

X4; SD4; p4

X

4242

The variance of samples and The variance of samples and confidence confidence intervalintervalss

μ, p0

4343

Confidence intervalConfidence interval

Statistical definition:Statistical definition:

If the study was carried out 100 times, If the study was carried out 100 times, 100 100 reresultssults ir ir 100 C100 CII were got, 95 were got, 95 times of 100times of 100 the the “true” value will be in that interval. But it will “true” value will be in that interval. But it will not appear in that interval 5 times of 100.not appear in that interval 5 times of 100.

4444

Confidence Confidence intervalintervalss((generalgeneral, , most common most common

calculationcalculation))

95% CI 95% CI :: X X ±± 1.96 1.96 SE SE XXminmin;; X Xmaxmax

Note: for normal distribution, when n is largeNote: for normal distribution, when n is large

95% CI 95% CI :: pp ±± 1.96 1.96 SESE ppminmin ;; p pmaxmax

Note: whenNote: when p ir p ir 1-p > 5/n1-p > 5/n

4545

StandarStandard errord error (SE) (SE)

Numeric dataNumeric data

((X X ))Categorical dataCategorical data

(p)(p)

4646

Width of confidence inervalWidth of confidence inerval

depends ondepends on::

a)a) Sample sizeSample size;;

b)b) Confidence levelConfidence level ( (guaranty - usually 95%, guaranty - usually 95%, but available any %)but available any %);;

c)c) dispersiondispersion..

4747

HipotHipotheses testingheses testing

HH00: : μμ11==μμ22; p; p11=p=p22; (RR=1, OR=1, ; (RR=1, OR=1, differencedifference=0)=0)

HHAA: : μμ11≠≠μμ22; p; p11≠p≠p22 (two sided, one (two sided, one sided)sided)

4848

Significance level Significance level αα (agreed (agreed 0 0..005).5).

TesTestt for for P P valuevalue (t-test, (t-test, χχ22 , etc, etc..).).

P P value is the probability to get the value is the probability to get the difference (association)difference (association),, if the null if the null hypothesis is truehypothesis is true..

OROR P P value is the probability to get the difference value is the probability to get the difference (association) due to chance alone, when the null (association) due to chance alone, when the null hypothesis is truehypothesis is true..

HipotHipotheses testingheses testing

4949

Statistical agreementsStatistical agreements

If If P<0P<0.05, we say, that results can’t .05, we say, that results can’t be explained by chance alone, be explained by chance alone, therefore we reject Htherefore we reject H00 and accept Hand accept HAA..

If If PP≥≥00.05, we say.05, we say, , that found that found difference can be due to chance difference can be due to chance alone, therefore we don’t reject Halone, therefore we don’t reject H0.0.

5050

TestTestssTest depends onTest depends on

Study designStudy design,, Variable typeVariable type distribution,distribution, Number of groups, etc.Number of groups, etc.

Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.

5151

P value tells, if there is statistically P value tells, if there is statistically significant difference (association).significant difference (association).

CI gives interval where true value can CI gives interval where true value can be.be.

Inferential statisticsInferential statisticsSummarySummary

5252

Inferential statisticsInferential statisticsSummarySummary

Neither P value, nor CNeither P value, nor CI I give other give other explanations of the result (bias and explanations of the result (bias and confounding). confounding).

Neither P value, nor CNeither P value, nor CI I tell anything tell anything about the biological, clinical or public about the biological, clinical or public health meaning of the resultshealth meaning of the results..