1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H....

101
1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail: [email protected] he Master of Science in Clinical Investigation Prog Vanderbilt University Medical Center

Transcript of 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H....

Page 1: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

1

Biostatistics I using SPSS

Date: September 13, 2005

Instructor: Ayumi Shintani, Ph.D., M.P.H.Department of Biostatistics, Vanderbilt University

E-mail: [email protected]

The Master of Science in Clinical Investigation ProgramVanderbilt University Medical Center

Page 2: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

2

Overview:3.1Categorical3.2Continuous3.2.1 Histograms3.2.2 Stem-&-Leaf Plots3.2.3 Boxplots3.2.4 Dotplots3.2.5 Error bar charts3.2.6 Error bar charts with lines3.2.7 Pie-charts

Graphical Display of Data Part 1

Page 3: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

3

Overview:

3.2.8 Simple Scatterplot3.2.8.1 Labeling points3.2.8.2 Identifying different groups for scatterplot3.2.8.3 Representing Multiple Points3.2.9 Scatterplot Matrix3.2.9.1 Addling lines into scatter plots3.2.9.2 Overlay plot with Loess Smoothers3.2.10 Three-dimentional Scatterplot

Graphical Display of Data Part 2

Page 4: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

4

•Easily convey characteristics of the data.

•Present many numbers in a small space.

•Make large datasets coherent.

•Encourage the eye to compare different sections of data.

•Be closely integrated with the statistical and verbal descriptions of the dataset.

•Be clearly labeled for easy understanding.

Graphs are pictorial representations of numerical data:

“A picture is worth a thousand t-tests.”

Graphical displays should:

Page 5: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

5

Mean log dose of sedative and analgesic medications administered during 24-hour period prior to cognitive assessment *

24-hour Transition NLorazepam dose

+/- SDFentanyl dose

+/- SDMorphine dose

+/- SD Propofol dose

+/- SD

Normal to Normal 97 0.2±4.0 0.1±2.7 0.2±5.7 0.0

Normal to Delirium 17 0.5±7.3 0.1±1.5 0.1±3.1 0.2±9.2

Normal to Coma 3 6.3±1.3 0.4±9.5 0.3±5.8 0.0

Delirium to Normal 62 0.2±4.3 0.1±3.1 0.2±5.8 0.0

Delirium to Delirium 197 0.5±8.4 0.2±4.5 0.3±10.5 0.1±5.5

Delirium to Coma 51 1.3±9.1 0.4±7.3 0.5±13.8 0.0

Coma to Normal 13 0.6±7.2 0.2±4.5 0.1±2.8 0.2±18.0

Coma to Delirium 89 0.7±10.4 0.3±5.8 0.3±11.7 0.1±4.0

Coma to Coma 167 1.4±14.2 0.4±7.4 0.4±11.5 0.2±12.6

Total 696

Page 6: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

6Error Bars show 95.0% Cl of Mean

C D N C D N C D N

0

10

20

30

Mean Lorazepam Dose (mg)

in 24 hours

Current Cognitive Status

Previous Cognitive Status Coma (C) Delirium (D) Normal (N)

Page 7: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

7

3.1 Graphical Display of Categorical Data

A histogram graphically displays the frequency distribution of categorical and continuous data. For categorical data, also called bar diagram, bar chart, or bar graph.

In medical papers, categorical data are very rarely graphically displayed. However, for posters, such graphical displays are typically more eye-catching than a table.

•The x-axis denotes each value of the categorical variable.•A vertical bar is drawn for each category. The bar can denote:

• Frequency (number of observations having that categorical value).• Fraction (proportion of total observations having that categorical value).• Cumulative Frequency (each bar represents a total number of patients who falls in the category or categories in lower orders. )• Mean (or other summary measures) of other variable for the category

Page 8: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

8

How to obtain Histogram in SPSS using Graph Option (1)

In SPSS, open Rothman.sav then go to Graphs (no interactive), Bar Charts, Select Simple

Page 9: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

9Frequency distribution is defined when each bar shows the number of observations having that categorical value.

How to obtain Histogram in SPSS using Graph Option (2):Frequency

Page 10: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

10

SPSS screen shot: Frequency

Page 11: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

11

Fraction is defined when each bar represents proportion of total observations having that categorical value.

How to obtain Histogram in SPSS using Graph Option (3): Fraction

Page 12: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

12

SPSS screen shot: Fraction

Page 13: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

13

How to obtain Histogram in SPSS using Graph Option (4): Cumulative Frequency

Cumulative frequency is defined where each bar represents a total number of patients who falls in the category or categories in lower orders.

Page 14: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

14Cumulative Frequency

Page 15: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

15

How to obtain Histogram in SPSS using Graph Option (5): Group Means

Each bar represents mean of another variable (continuous) for the category

Page 16: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

16

Group Means

Page 17: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

17

How to obtain Histogram in SPSS using Interactive Graph Option (1): Group Means

Bars show counts

8th degree or lessSome High School

High School GradSome College

College Grad or above

Education

0

20

40

60

Co

un

t

n=34 n=42 n=65 n=39 n=12

Page 18: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

18

Using Interactive graphics: In SPSS, go to: Graphs, Interactive, Bar, …

Frequency using Interactive Graph Option

Page 19: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

19

8th degree or lessSome High School

High School GradSome College

College Grad or above

Education

0.0

4.0

8.0

12.0

Bas

elin

e H

bA

1c

n=34 n=42 n=65 n=39 n=12

How to obtain Histogram in SPSS using Interactive Graph Option (2): Group Means with Error Bars

Note: I don’t personally recommend this type of graphs.

Page 20: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

20

Using Interactive graphics: In SPSS, go to: Graphs, Interactive, Bar, …

Group Means with Error Bars (1)

Page 21: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

21

Group Means with Error Bars (2)

Page 22: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

22

3.2 Graphical Displays of Continuous Data

3.2.1 Histograms

Displays frequency distribution for continuous data.

However, in contrast to categorical data, continuous data need to be grouped, and the # of groups must be chosen, which is subjective.

Page 23: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

23

30 40 50 60 70 80

age (yrs)

0

10

20

30C

ou

nt

How to obtain Histogram Continuous Data Histogram using Interactive Graph Option (1): Frequency Distribution

Page 24: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

24

In SPSS, read Rothman.sav, go to: Graphs, Interactive Histogram

Frequency Distribution for Continuous Data (1)

Page 25: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

25Frequency Distribution for Continuous Data (2)

Page 26: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

26

What kinds of things should I look for in a histogram?

1. Look for cases with values very different from the rest.

2. Look whether distribution is symmetric (normality).

3. Look for separate clusters of data values. For example, you may see a two clusters, i.e., peaks. One peak may be from male patients, and the other may from female. In such situation, you may want to analyze the data separately for males and females.

Page 27: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

27

30 40 50 60 70 80

age (yrs)

0

10

20

30C

ou

nt

Editing Histogram (1): Adding normality curve

Page 28: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

28

In SPSS, read Rothman.sav, go to: Graphs, Interactive Select Histogram Click on Histogram dialog box

Adding Normal Curve to Histogram

Page 29: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

29

Editing bin size on histogram (1)

In SPSS, after you create a histogram using interactive graphs, double clickon the figure and open Chart Editor. Click Interval Tool.

Page 30: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

30

NOTE: Without specification, SPSS automatically determines the number of groups (bins).

Editing bin size on histogram (2)

Page 31: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

31

What will happen if you use smaller number of bins?

Which histogram do you find more useful?

30 40 50 60 70 80

age (yrs)

0

25

50

75

Co

un

t

#bins=5

30 40 50 60 70 80

age (yrs)

0

4

8

12

Co

un

t

#bins=50

30 40 50 60 70 80

age (yrs)

0

10

20

30

Co

un

t

#bins=20

Page 32: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

32

Now, consider histograms of age stratified by study arms:

30 40 50 60 70 80

age (yrs)

0

5

10

15

Co

un

t

Control Intervention

30 40 50 60 70 80

age (yrs)

Important :Whenever you are interested in comparing continuous variable between groups, you must look at data separately for groups.

Page 33: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

33Histogram of Age Stratified by Status

Page 34: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

34

3.2.2 Stem-&-Leaf Plots

A useful way of tabulating the original data and, at the same time, depicting the general shape of the frequency distribution.

The stem consists of all but the rightmost digits of the data.

The leaf represents the leftmost digits.

age (yrs) Stem-and-Leaf Plot

Frequency Stem & Leaf

2.00 Extremes (=<21) 3.00 2 . 588 4.00 3 . 1233 10.00 3 . 5577888999 13.00 4 . 0000113333344 28.00 4 . 5555556666677777778888999999 30.00 5 . 000000111111122222222333333444 42.00 5 . 555555566666677777778888889999999999999999 25.00 6 . 0000111122222233333344444 14.00 6 . 55566666777778 9.00 7 . 000112234 12.00 7 . 555666777889 1.00 Extremes (>=87)

Stem width: 10 Each leaf: 1 case(s)

Question: What are exact values of age 20 years or older and less than 30 years old?

A stem-and-leaf plot, like a histogram, shows how many cases have various data values. A stem-and-lead plot preserved more information than a histogram because it does not use the same symbol to represent all cases. Instead, the symboldepents on the actual value for a case.

Page 35: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

35Stem-&-leaf plot of patient’s age.

In SPSS, go to: Analyze, Descriptive Statistics, Explore

Page 36: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

36

3.2.3 Box Plots / Box-and-Whisker plot

A graphical summary for continuous data using percentiles

Bar charts and histograms are convenient for displaying summary information about data, but they provide very little information about anything other than the values of the measure. Box-plots are popularly used to summarize data, which simultaneously displays the median, the inter-quartile range, and the smallest and largest values of data. A useful application of box plots is to graphically compare the distribution of a continuous measure across different levels of a categorical variable.

Page 37: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

37

Control

Intervention

Study Status

Outliers are hiddenExtreme values are hidden

Non-User User

on insulin at enrollment

5.0

7.5

10.0

12.5

15.0

12 M

on

th H

bA

1c

75th percentile

25th percentile

50th percentile / median

“Whiskers’ extend to largest and smallest observed values within 1.5-box lengths

How do you interpret these box plots?

Page 38: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

38

Extreme values: defined by observed valueMore than 3 box-lengths from upper (75th) orlower (25th) value.

Outliers: defined by observed valueMore than 1.5-box and less than 3-box lengths from upper (75th) or lower (25th) value.

1.5 Boxes

3 Boxes

Page 39: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

39

How to obtain Box-plot using SPSS (1):

Page 40: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

40

How to obtain Box-plot using SPSS (2):

Then click Boxes to go to the next page.

Page 41: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

41

How to obtain Box-plot using SPSS (3):

Page 42: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

42

What can you tell from box-plot?

• From the median, you can get an idea of the typical value (central tendency)

•From the length of the box, you can see how much the values vary (data dispersion)

If the median line is not in the center of the box, you can tell that distribution of your data blues is no symmetric.

If the median is closer to the bottom of the box than to the top, there is a tail toward large values (positive skewness).

If the median is closer to the top of the box than to the bottom, there is a tail toward smaller values (negative skewness)..

Page 43: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

43

0

4

8

12

Co

un

t

Non-User Control User Control

Non-User Intervention User Intervention

6.0 8.0 10.0 12.0 14.0

12 Month HbA1c

0

4

8

12

Co

un

t

6.0 8.0 10.0 12.0 14.0

12 Month HbA1c

Using histogram

Let’s compare box-plot with other methods.

Page 44: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

44

Non-User

User

on insulin at enrollment

Error Bars show Mean +/- 1.0 SD

Bars show Means

Control Intervention

Study Status

2.0

4.0

6.0

8.0

10.0

12 M

on

th H

bA

1c

n=60 n=35 n=60 n=38

Let’s discuss pros and cons of each method of graphics.

Using bar-chart for mean of 12 month HbA1c

Page 45: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

45

Checking for Normality of Data in SPSS

How do we know if data are normally distributed? SPSS has a nice features for testing and visual diagnosis for normality.

In SPSS, open Rothman.sav and go to: Analyze, Descriptive Statistics, Explore put ranChisq and ranNorm into dependent list box Click on Plots, In Plots dialog box, select Normality plots with tests

Page 46: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

46Checking Normality (1)

Page 47: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

47Checking Normality (2)

Page 48: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

48Checking Normality (3)

Page 49: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

49

ranChisq Stem-and-Leaf Plot

Frequency Stem & Leaf

65.00 0 . 00000000000000000011111111111111 21.00 0 . 2222233333 26.00 0 . 444444455555 18.00 0 . 666666777 14.00 0 . 888999 7.00 1 . 000& 10.00 1 . 2333 4.00 1 . 5& 4.00 1 . 67 5.00 1 . 88& 3.00 2 . 1 3.00 2 . 3& 1.00 2 . & 12.00 Extremes (>=2.5)

Stem width: 1.00 Each leaf: 2 case(s)

& denotes fractional leaves.

SPSS Output from Explore : Skewed Data (1)

Page 50: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

50

SPSS Output from Explore : Skewed Data (2)

Page 51: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

51

Tests of Normality for RanChisq

.214 193 .000 .729 193 .000ranChisq

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Formal Statistical Test for Normality

SPSS Output from Explore : Skewed Data (3)

Page 52: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

52

ranNorm Stem-and-Leaf Plot

Frequency Stem & Leaf

2.00 -2 . 55 3.00 -2 . 223 5.00 -1 . 57789 16.00 -1 . 0000011112222233 30.00 -0 . 555556666677777777888888999999 34.00 -0 . 0000011111111111112222333334444444 45.00 0 . 000000000000011111112222222222223333333333444 26.00 0 . 55555556666777777788888899 19.00 1 . 0000000000123333444 7.00 1 . 5566777 6.00 2 . 011222

Stem width: 1.00 Each leaf: 1 case(s)

Tests of Normality for ranNorm

.040 193 .200 * .993 193 .440ranNormStatistic df Sig. Statistic df Sig.

Kolmogorov-Smirnov a Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

SPSS Output from Explore : Normally Distributed Data (1)

Page 53: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

53

SPSS Output from Explore : Normally Distributed Data (2)

Page 54: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

54

Data transformation to achieve normality

Many types of laboratory data, specifically data in the form of concentrations of one substance, length of duration can be expressed with a skewed distribution.

Transformation, such as taking logarithmic some times make these skewed variables to normally (Gaussian) distributed.

In SPSS, use Transform, Compute dialog box to transform baseline Hba1c valueInto log(e) scale. Then compare distributions of un-transformed and transformed data.

6.0 8.0 10.0 12.0 14.0

12 Month HbA1c

5

10

15

20

25

Co

un

t

1.80 2.00 2.20 2.40 2.60

logHa1c12

5

10

15

20

25

Co

un

t

Page 55: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

55

3.2.4 Dotplots

Similar to a stem-&-leaf plot (or a histogram displayed vertically), but data expressed using dots.

Dot/Lines show counts

5.0 7.5 10.0 12.5 15.0

12 Month HbA1c

2

4

6

8

10C

ou

nt

Similar to box plots, dotplots are useful for comparing distributions of a continuous measure across different levels of a categorical variable.

Page 56: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

56

Dotplots of 12 month HbA1c stratified by Study arm and insulin use:

Page 57: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

57

How to obtain dot plot in SPSS (1)

Page 58: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

58

How to obtain dot plot in SPSS (2)

Page 59: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

59

3.2.5. Error Bar Chart

Error Bars show 95.0% Cl of Mean

Control Intervention

Study Status

9.5

10.0

10.5

11.0

11.5

Bas

elin

e H

bA

1c

Non-User User

Control Intervention

Study Status

Page 60: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

60

Read Rothman.sav into SPSS, then go to:Graphs, Interactive, Error bar..

How to obtain Error Bar Chart in SPSS (1)

Page 61: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

61

Select a set of Ha1c as Y-axis variable Select Status as X-axis variable Click on Error bars, select Display error bars, OK

How to obtain Error Bar Chart in SPSS (2)

Page 62: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

62

3.2.6. Error bar chart with line:

Page 63: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

63

How to obtain Error Bar Chart with Line in SPSS (1)

Page 64: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

64

How to obtain Error Bar Chart with Line in SPSS (2)

Page 65: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

65

How to obtain Error Bar Chart with Line in SPSS (3)

Page 66: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

66

How to obtain Error Bar Chart with Line in SPSS (4)

Page 67: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

67

How to obtain Error Bar Chart with Line in SPSS (5)

Page 68: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

68

Editing Error Bar Chart with Lines: Editing Connecting lines (1)

Double click on the error bar chart to open Chart Editor.In Chart Editor, click on the object you want to edit, Here we want to editLines, so click on lines. Change Dot and Line size.Click on error bar, in error bar dialog box, click on width to fix the gap between Connecting lines and error bars. Move the cursor for cluster to 10%.

Page 69: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

69

Editing Error Bar Chart with Lines: Editing Connecting lines (2)

Page 70: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

70

Page 71: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

71

3.2.7. Never use Pie charts.

1.00

2.00

3.00

4.00

5.00

6.00

7.00

VAR00001

Pies show Sums of VAR00002

Which category (from 1 to 7) do you think the largest?

Page 72: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

72

Redoing the previous page graph pie chart using bar-charts and line chart.

Bars show Means

1 2 3 4 5 6 7 8 9 10

Case

10.00

20.00

30.00

40.00

VA

R00

002

In SPSS, go to:Graphs, Interactive, Bar,

Page 73: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

73

Creating a bar graph directly from each data point.

Page 74: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

74

Redoing the previous page graph pie chart using line chart.

1 2 3 4 5 6 7 8 9 10

Case

0.00

10.00

20.00

30.00

40.00V

AR

0000

2

Page 75: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

75

Creating a line graph directly from each data point.

In SPSS, go to:Graphs, Interactive, Bar,

Page 76: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

76

3.2.8 Scatterplots

One of the best ways to look for relationships and patterns among multiple continuous variables.

Each point represents a pair of values. One variable is represented by the x-axis and the other by the y-axis.

In previous lecture, you’ve used a variety of graphical displays to summarize single variable. In this lecture, we will learn how to display the values or two variables in meaningful scale.

Circles point represents ID=216Baseline HbA1c=21.1%12month HbA1c=13.5%

Page 77: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

77

•Read Rothman.sav into SPSS

• To produce a scatterplot of 12 months HbA1c by baseline HbA1c, from the menus choose:

  Graphs, Scatter/Dot...{uses non-interactive mode this time}

• Select simple scatter plot•  Click Define.

How to obtain the scatter plot in SPSS (1)

Page 78: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

78

How to obtain the scatter plot in SPSS (1)

Page 79: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

79

What can you tell from the scatterplot?

Scatterplots are not randomly scattered over the grid. There seems to be a pattern.

The points are concentrated in a bottom left to top right, indicating as baseline HbA1c value increases, 12 month value increases. That is, a straight line might be a reasonable summary of the data.

You can also determine whether these are cases that have unusual combinations of values for the two variables. You may want to validate the observations on ID=216, is it clinically real to have Baseline HbA1c=21.1% with 12month HbA1c=13.5%.

Page 80: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

80

3.2.8.1 Labeling the Points

Page 81: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

81

In order to add a label for the observed value on the next page,In Simple Scatterplot dialog box,        Select 12 Month HbA1c as the y variable and Baseline HbA1c as the x variable. Additionally, set ID under “case labeled by”.       Click OK.

How to label a point in a scatter plot (1)

Page 82: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

82

Double click on the scatterplot to open Chart Editor.In Chart Editor, click on then click on the point value you want to show ID number.

How to label a point in a scatter plot (2)

Page 83: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

83

3.2.8.2. Identifying different groups for scatterplot.

Page 84: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

84

To identify points by study arm, select STATUS for Set Markers by, as shown below.

How to identify different groups for scatterplot

Page 85: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

85

3.2.8.3. Representing Multiple Points

Page 86: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

86

In the Chart Editor, double-click on any point in the figure. In the Properties dialog box, click the Point Bins tab. Under Display At, select Bins. Under Count Indicator, select Marker Size.

How to represent multiple points in scatter plot.

Page 87: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

87

3.2.9. Scatterplot Matrices.

So far, we have looked a the relationship between two variables. What if you want to see how these variables to relate to another variable. A scatterplot matric is a display that contains scatterplots for all possible pairs of variables.

Is there any way to help understand relationship between two variables?

Page 88: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

88

How to obtain scatterplot matrices.

Page 89: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

89

Page 90: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

90

3.2.9.1. Adding Lowess smother to scatterplot

Page 91: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

91

Read Rothman.sav into SPSSFollow the instruction for scatterplots, After you create scatterplot matrices      * activate the graph by double-clicking on it.      * Highlight all points in the Chart Editor.      * Click the Add fit line tool, click on fit line, then chose LOESS with % of points to fit =50

How to add Lowess smother to scatterplot (1)

Page 92: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

92

How to add Lowess smother to scatterplot (2)

Page 93: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

93

A scatteplot matrix of 12 month HbA1c, 12 month systolic blood pressure, age, baseline BMI has the same number of rows and columns as there are variables. In this example, you see 5 row and 5 columns. Each cell of the matrix, except for cells on the diagonal is a plot of a pair of variables.

What’s the easiest way to read a scatterplot matrix?

Try to scan across an entire row or column. For example, in the previous pageFigure, you will see that 12 month HbA1c value correlate to 6 month value but not much with baseline value. Plots symmetric along diagonal line is in fact the same plots, so you may want to ignore one of the plots.

Page 94: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

94

3.2.9.2. Overlay Plots

Un-interactive option does not work well for this, so use interactive graphs.

Control

Intervention

Study Status

LLR Smoother

5.0 7.5 10.0 12.5 15.0

12 Month HbA1c

8.0

12.0

16.0

20.0

Bas

elin

e H

bA

1c

Page 95: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

95

In SPSS, go to, Graph, Interactive, Scatter…In Scatterplot dialog box, Open “Fit” dialog box by clicking the menu Enter 5 into each bandwidths Choose Subgroup under “Fit lines for”

How to overlay 2 scatter plots (1)

Page 96: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

96

How to overlay 2 scatter plots (2)

Page 97: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

97

3.2.10. Three dimensional Scatter Plots

Un-interactive option does not work well for this, so use interactive graphs.

Page 98: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

98

In SPSS, go to, Graph, Interactive, Scatter…In Scatterplot dialog box, Select, 3-D coordinate, which will give you an option to add the third coordinate

How to create three dimensional scatter plots

Page 99: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

99

Compare the figures below. You may realize that it is very hard to understand relationship between variables from the 3 dimensional figure, You may rather want to show each pair wise relationship to describe the dynamic relationship. I recommend “never” use 3 dimensional graphs. Use scatter plot matrices instead.

Page 100: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

100

< 50 years

Control subjects Early RA Established RA0

10

20

30

40

50

60

70

80

90

Agatston score = 0

Agatston score = 1-109

Agatston score >109

29/35

5/35

1/35

25/29

4/29

14/19

3/192/19

0/29

Per

ce

nta

ge

50-59 years

0

10

20

30

40

50

60

70

80

90

Control subjects Early RA Established RA

16/30

12/30

2/30

12/25

9/25

4/25

6/19

5/19

8/19

Per

cen

tag

e

>=60 years

Control subjects Early RA Established RA0

10

20

30

40

50

60

70

80

90

8/21

4/21

9/21

3/16

5/16

8/16

8/336/33

19/33

Per

cen

tag

e

The prevalence of coronary-artery calcification among patients with rheumatoid arthritis and control subjects, according to age.

Example from a real practice: (Before paper revision)

Page 101: 1 Biostatistics I using SPSS Date: September 13, 2005 Instructor: Ayumi Shintani, Ph.D., M.P.H. Department of Biostatistics, Vanderbilt University E-mail:

101

<50 years 50-59 years >60 years0

10

20

30

40

50

60

70

80

90Controls

Early RA

Established RA

0

10

20

30

40

50

60

70

80

90

Age

Pre

vale

nce

of c

oron

ary-

arte

ry

calc

ifica

tion

(%)

Example from a real practice: (After paper revision) The prevalence of coronary-artery calcification among patients with rheumatoid arthritis and control subjects, according to age.

There was a significant interaction between age and disease-status (P-value for interaction <0.05). For age < 50 years and 50-59 years the prevalence of coronary calcification was increased in patients with established RA compared to controls (both P<0.05) but this was not significant in subjects > 60 years.