Effective Use of Graphs
description
Transcript of Effective Use of Graphs
Effective Use of Graphs
Annie HerbertMedical Statistician
Research & Development Support UnitSalford Royal (Hope) Hospitals NHS Foundation Trust
[email protected](0161 720) 2227
Timetable
Time Task
60 mins Presentation
20 mins Coffee Break
90 minsPractical Tasks in
IT Room
Outline• Graphs for categorical data
• Graphs for numerical data
• Comparing groups
• Additional graphs (covered in other courses)
• Final tips & Computer packages
Categorical Data (1)
Examples: • Sex
– Male/Female
• Blood Group
– A/B/AB/O
• Employment Status
– Unemployed/Part-time/Full-time
Categorical Data (2)
• Record: Frequency (discrete number) per category
• Summary: Frequency OR
percentage/fraction/proportion
• Visually:
- Bar Chart - Pie Chart
Official Employment Status of Population of Camberwick Green
0
500
1000
1500
2000
2500
Unemployed Part-time Full-time
Employment Type
Fre
qu
en
cy
Official Employment Status of Population of Camberwick Green
Unemployed
Part-time
Full-time
Example – Discharge Destination (1)
Where Patient Lives n = 731
Alone 339 (46.3%)
Family 210 (28.7%)
Home 180 (24.6%)
Other 2 (0.3%)
Example – Discharge Destination (2)
Discharge Destinations of Patients
050
100150200250300350400
Alone Family Home Other
Discharge Destination
Fre
qu
ency
Example – Psychiatric Illness/ Discharge Destination (1)
Psychiatric Illness?
WherePatientLives
Non=208
Yesn=523
Alone 117 (56%) 222 (42%)
Family 81 (39%) 129 (25%)
Home 9 (4%) 171 (33%)
Other 1 (0%) 1 (0%)
Example – Psychiatric Illness/ Discharge Destination (2)
Where Patient Lives
Psychiatric Illness? Alone (n=339) Family (n=210) Home (n=180) Other (n=2)
No 117 (35%) 81 (39%) 9 (5%) 1 (50%)
Yes 222 (65%) 129 (61%) 171 (95%) 1 (50%)
Example – Psychiatric Illness/ Discharge Destination Bar Chart
Discharge Destination of Patients with and without Psychiatric Illness
0
50
100
150
200
250
Alone Family Home Other
Discharge Destination
Fre
qu
ency
No
Yes
Stacked Bar ChartDischarge Destination of Patients with and without
Psychiatric Illness
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Alone Family Home Other
Discharge Destination
Per
cen
tag
e
Yes
No
Re-ordering categories can emphasize a certain effect:
Discharge Destination of Patients with and without Psychiatric Illness
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
No Yes
Psychiatric Illness?
Per
cen
tag
e Other
Home
Family
Alone
Discharge Destination of Patients with and without Psychiatric Illness
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
No Yes
Psychiatric Illness?
Per
cent
age
Family
Other
Home
Alone
The axis should always start from 0:
Discharge Destination (Alone, Family) for Patients with and without Psychiatric Illness
0
50
100
150
200
250
No Yes
Psychiatric Illness?
Fre
qu
ency
Alone
Family
Discharge Destination (Alone, Family) for Patients with and without Psychiatric Illness
70
90
110
130
150
170
190
210
230
No Yes
Psychiatric Illness?
Fre
qu
ency
Alone
Family
Bar Charts – Adv & Disadv
• Advantages:- Visually strong.- Easy to compare between more than one
dataset.
• Disadvantages:- Categories can be ‘re-ordered’ to emphasize
certain effects.- Misleading if not used for counts.- Misleading if y-axis not from 0.
Bar Charts – Things to consider:
• What group differences are you interested in?
• Frequencies or percentages? If percentage, it’s down to you to specify the totals.
• Is ‘Other’ a large frequency/percentage?
• Consider the categories as un-ordered when using a stacked bar chart.
Pie Charts
Psychiatric Illness? No
Home
Alone
Family
Other
Psychiatric Illness? Yes
Home
Alone
Family
Other
Pie Charts – Advantages:
• Easy to compare categories, are equidistant from each other.
• Ordering of categories does not emphasize certain effects as badly as stacked bar charts do.
Pie Charts – Disadvantages:• No choice between frequencies and
percentages (down to you to specify totals).
• Cannot put more than one data set into a pie chart.
• Lose individual values of the data.
• Limited space: if using more than 5 or 6 categories, chart can look complicated.
Numerical Data (1)
Examples:
• Weight
• Blood Pressure
• Cholesterol Levels
Numerical Data (2)• Record: Number/Value
(discrete or continuous)
• Summary: - Mean (SD) - Median (IQR)
• Visually:- Histogram - Box plot - Spread plot
Data – Ages of Patients inSelenium Study
Age
48
36
56
66
65
19
36
59
48
52
67
39
28
58
48
49
39
57
62
74
59
66
45
69
55
63
42
68
54
24
19
70
73
29
34
50
Histogram – Ages of Patients inSelenium Study
Histograms for the same data can vary:
Compromise:
Beware!Histogram is not Bar Chart
Length of stay (days)
1901701501301109070503010
400
300
200
100
0
Length of stay (days)
>120
61to120
31to60
15to30
7to14
5to7
<5
Cou
nt
400
300
200
100
0
Histograms – Advantages:
• Visual display of interval frequencies, easy to compare intervals.
• Can give an idea of the distribution of the data, e.g. shape, typical value, spread.
Histograms – Disadvantages:
• Choice of interval width can alter appearance.
• Individual values lost.
• One data set per histogram, difficult to compare data sets.
Box Plot
Upper Quartile
Lower QuartileMedian
Extreme Outlier
Outlier
Box Plots – Advantages:
• Defines many summary statistics in one plot.
• Defines ‘outliers’ explicitly.
• Can have more than one data set in a plot, so easy to compare data sets:
Box Plots – Disadvantages:
• More complicated visually than some other types of data plots.
• Individual values lost.
Spread Plots (1)
Spread Plots (2)
• Advantages: - Can give an idea of the distribution of the
data, e.g. shape, typical value, spread.- Shows individual values of the data.- Can show more than one dataset in a plot.
• Disadvantages:- Not very widely used in journal publications.- Doesn’t explicitly summarise statistics or
outliers as box plot does.
Relationships in Numerical Data
Serial Measurements
Mean TG (±standard error) at each time point
-100 150 400 6500
1
2
3
Mea
n T
G (
mM
)
Time (minutes)
Change of TG over time
-100 150 400 6500
1
2
3
TG
(m
M)
Time (minutes)
E 1.1
E 2.2
E 3.2
E 4.1
E 5.1
E 6.1
What information does this give?
Mean ± SE, n ≈ 30 per group
Better to look at individual data…
…or give a sensible summary.
Kaplan-Meier Curve (step graph)
Time-to-Event data.
Survival Plot (PL estimates)
0 100 200 3000.00
0.25
0.50
0.75
1.00
Surv
ivor
Times
1
0
Bland-Altman Plots (scatter plots)
How well do two methods of measurement agree?
Agreement Plot (95% limits of agreement)
200 250 300 350 400 450-100
-50
0
50
100
mean
diff
ere
nce
Forest Plots (Hi-Lo-Close charts)
Meta-Analysis.
Forest (meta-analysis) plot
0.2 0.5 1 2
Pooled 0.75 (0.50, 1.14)
KW 0.80 (0.47, 1.36)
MT 0.80 (0.60, 1.07)
SW 0.68 (0.45, 1.03)
AH 0.72 (0.48, 1.08)
Final Pointers:• Before plotting think about the type of data and
what you would like to compare.
• Show all data rather than summaries where possible.
• Label axes clearly. Graph should ‘stand alone’.
• Make sure when comparing groups that outcome on the same scale.
• Make sure any colours used are sufficiently different from each other, and not red/green.
Using a Computer Package:
Package Advantages Disadvantages
SPSS Produces journal quality graphs
• Difficult to start with• Expensive
StatsDirect When copied and pasted, these graphs may be edited in Word
Difficult to draw bar/pie charts
Excel Easy to use for bar/pie charts
Not a statistics package