3 Essentials for Better Graphs - gradschool.usu.edu...3 Essential steps for better data...
Transcript of 3 Essentials for Better Graphs - gradschool.usu.edu...3 Essential steps for better data...
1
3 Essential steps for better data visualizationDr. Abby Benninghoff
3 Essential steps for better data visualization. January 20, 2016
1 Use the right kind of plot for your data
2 Figures should elucidate information, not obfuscate
3 Apply principles of effective formatting
3 Essentials
2
Use the right kind of plot for your data
1
3 Essential steps for better data visualization. January 20, 2016
Pictures as figures• Never assume that anyone knows what
is in a picture– Use arrows, markers to identify features– Include scale bar– Specify meanings of colors– Include key explanations in figure legend
3
3 Essential steps for better data visualization. January 20, 2016
Use diagrams to convey complex ideasThis works. Particularly as an overview in an introduction section or a review paper.
But, probably too much for a quick slide in a presentation.
3 Essential steps for better data visualization. January 20, 2016
Use diagrams to convey complex ideas
This, not so much.
4
3 Essential steps for better data visualization. January 20, 2016
What type of graph?• Line graphs for dynamic comparisons
– Dependent variable changes with respect to time, dose, etc.
– Do not over-crowd with multiple (>4) lines– Use curve-fitting when appropriate
(provide curve fit model)
• Scatter plot for correlations– Most commonly made in two dimensions– Line of best fit – linear regression– Show r statistic, confidence interval lines or
other indication of robustness of the data
3 Essential steps for better data visualization. January 20, 2016
What type of graph?• Bar graph for subdividing and comparing
data– Preferred as vertical rather than horizontal– Used if no continuum of data points that would
work as a line graph– Make bars the same width, space between is
one-half a bar
• Pie chart to compare parts of a whole– Useful for visually showing how much subgroups
contribute to the whole– Problematic if too many groups are shown– Problematic if color not allowed (shading okay,
but patterns look bad)
5
3 Essential steps for better data visualization. January 20, 2016
What type of graph?• Box plot useful for descriptive
statistics– Will highlight differences in
data set populations– Box outline shows 25th to 75th
percentile
– Line shows the mean
– Error bars show a defined distance set by user (min to max, 10 to 90th percentile)
– Dots indicate possible outliers AIN93G TWD0.01
0.1
1
10
100
Tum
or v
olum
e (m
m3 )
Water Tart Cherry
aab
cbc
3 Essential steps for better data visualization. January 20, 2016
What type of graph?Contour plot
• Useful for multi-dimensional data
• Contour lines indicate regions that fit within a defined scale
• Shows scaled values in relationship to two experimental factors or other reference measurements on XY axes or other coordinate system
• Color is very effective in these plots
6
Graphs should elucidate information, not obfuscate
2
3 Essential steps for better data visualization. January 20, 2016
Do not manipulate to (de)emphasize observations• Do not manipulate the
figure to sway the reader or inappropriately emphasize or disguise trends
• Do not start axes at midpoint in scale without visual cue (e.g., axis break)
2 4 6 8 10 12
0
5
10
15
20
25
30
Weeks of age
Bod
y w
eigh
t (g)
AIN93G
DIOTWD
ab
4 8 12
0
5
10
15
20
25
30
Weeks of age
Bod
y w
eigh
t (g)
AIN93G
DIOTWD
ab
AIN93G TWD DIO
0
10
20
30
40
Fina
l bod
y w
eigh
t (g)
NoneDSS
FemaleMale FemaleMale FemaleMaleAIN93G TWD DIO
18
20
22
24
26
28
30
Fina
l bod
y w
eigh
t (g)
NoneDSS
FM FM FM FM FM FM
7
3 Essential steps for better data visualization. January 20, 2016
Including/excluding data• Do not arbitrarily delete data
points without clear justification– Experimental error documented
in your laboratory notebook– Statistical test for outliers
• Don’t “massage” line fits or change parameters post-hoc to best fit your data
0.0 0.5 1.0 1.515
20
25
30
35
40
Variable A
Varia
ble
B
Consider how the four points colored blue influence the regression fit.
Are these outliers? Do they have an outsized impact on the apparent trend?
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
0
25
50
75
100
125
E2PFOS10:2 FtOH
Inhibitor concentration (Log M)
[3 H]-E
2 B
ound
(%)
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
0
25
50
75
100
125
E2PFOS10:2 FtOH
Inhibitor concentration (Log M)
[3 H]-E
2 B
ound
(%)
A switch from 4-parameter to 3-parameter curve allows for fit of 3rd
data set, but is this post-hoc change in analysis appropriate?
3 Essential steps for better data visualization. January 20, 2016
Presenting multiple figures• Similar data presented jointly
should be shown at the same scale (most of the time) to enable appropriate comparisons
• Data groupings should infer what type of analysis was performed. – This presentation suggests two
separate Student’s t-tests. But what if a two-way ANOVA were performed?
The center panel uses a different scale for the same type of data as in the left panel. Clearly, an comparison between ”Water” and “GT” is intended. But the different scales obfuscate the comparison.
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
Water
Control Treatment0
1
2
3
4
5
Res
pons
e (u
nits
)
GT
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
GT
The right panel corrects this problem. Now you can see more easily that the response was overall a bit lower for both control and treatment
in the GT group compared to the Water group. The dashed red line added helps you visualize the impact of the selected scale.
8
3 Essential steps for better data visualization. January 20, 2016
Graph style should reflect experiment design• Data groupings should reflect the
experiment design and the type of statistical analysis performed. – This presentation suggests two
separate Student’s t-tests. But what if a two-way ANOVA were performed?
– This presentation groups the data by control/treatment and by Water/GT, appropriate or this 2x2 experiment design.
– Note that the first approach did not allow for the most interesting comparison, water vs. GT!
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
Water
Control Treatment0
1
2
3
4
5
Res
pons
e (u
nits
)
GT
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
GT
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
Water
Control Treatment0
1
2
3
4
5
Res
pons
e (u
nits
)
GT
Control Treatment0
2
4
6
8
Res
pons
e (u
nits
)
GT
Control Treatment0
1
2
3
4
5
6
7
Res
po
nse
(un
its)
ac
b
a
c
Water GT
Apply principles of effective formatting3
9
3 Essential steps for better data visualization. January 20, 2016
Characteristics of well-designed figures• Neatness – Image is clean and sharp, invites attention• Readability – Eye can discern important information easily, quickly• Font – Text is large enough to be read, placed well and limited• Size – Graph is sized appropriately for anticipated reduction during printing.• Aesthetics – Balanced graphs, good use
of white space, eyes drawn to most important features
• Use of color – Distinguishes important content, pleasing to eye
• Consistency – Similar graphs have same stylistic scheme (line width, font type and size, labeling, scales, etc.)
3 Essential steps for better data visualization. January 20, 2016
A poorly formatted graphLet me count the problems1. No units2. Scale isn’t appropriate 3. Axis, error bars and bar outline
thickness4. Font on Y-axis too small, serif font5. Axis title fonts uneven6. Dependent and independent
variables not on the right axes7. Minor ticks not helpful8. Too many X-axis labels9. Patterns are rather obnoxious, difficult to distinguish10.Categories on Y-axis are not in any useful order
10
3 Essential steps for better data visualization. January 20, 2016
How can I fix it?
Independent variable (treatments) on the X-axis
Dependent variable (measurement) on the Y-axis
Use solid fills and big patterns to distinguish bars
Use large font for axis titles, two sizes smaller for axis labels; be consistent in font sizes
Scale is more appropriate;subdivide Y-axis into big units; don’t use minor ticks (unless log scale)
Use the same thickness for axes, bar outlines and error bars – thick enough to be seen but not overly thick
Order your categories so that they are easily interpreted, tell the story; use white for control or reference group which should be placed at the left-most position
Keep the figure looking clean and easily readable
3 Essential steps for better data visualization. January 20, 2016
Poorly formatted line graphStraight from Excel with no tinkering. What is missing? What are the problems?
1. Grid lines are distracting2. No axis labels3. Category labels are not
informative4. Series aren’t labeled5. Position of data points not
clear (no symbols)6. No error bars7. Fonts too small
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7
Series1
Series3
11
3 Essential steps for better data visualization. January 20, 2016
Reformatted (not Excel) Keep the field behind the data clear of any grid lines (unless needed to show reference measurement, such as for normalized data)
Use big symbols, easy to see when graph is reduced for publication. Use shading to help distinguish symbols.
Include a legend to identify symbols
Follow previous recommendations for font sizes, axes thickness, etc. Also use the same thickness for symbol outline.
If the first data point falls on top of the Y-axis, then offset the axes for clarity.
3 Essential steps for better data visualization. January 20, 2016
Examples of poorly done graphs
12
3 Essential steps for better data visualization. January 20, 2016
The 3D temptation• Do not use 3D unless absolutely
necessary.• Many other ways to show 3 levels of data
without using a rotated 3D graph
3 Essential steps for better data visualization. January 20, 2016
When 3D works
(Actually, these plots incorporate 4 levels of information; 3 different axes and color!)
13
3 Essential steps for better data visualization. January 20, 2016
Exceptions to the “rules”• Sometimes, breaking the rules of figure formatting is necessary to make
the graph more clear and to emphasize the key findings
• Using horizontal lines to emphasize data trends, deviation from baseline• (Note that these graphs have other design problems)
3 Essential steps for better data visualization. January 20, 2016
Exceptions to the “rules”• Use minor ticks to
distinguish logarithmic scales
• A signal to the reader that the scale is not linear
14
3 Essential steps for better data visualization. January 20, 2016
Exceptions to the “rules”• Horizontal graphs are okay, if they aid in data interpretation
3 Essential steps for better data visualization. January 20, 2016
(Judicious) Use of color in publications• Use color if it is necessary to emphasize
your point• Do not use it to merely to jazz up the paper,
color is typically costly for publication• Suggest keep your control/reference white,
color the other groups. • Use colors with lots of contrast that go well
together; yellow or orange with blue; primary colors; shades within a color.
• Avoid red + green• All this said, this example graph works just
fine as B&W for print publication with the right formatting
15
3 Essential steps for better data visualization. January 20, 2016
Black & White Color
• Color costs money in hard copy publication! So, be prepared to have both B&W and Color versions of your data.
• Definitely use color in your presentations
3 Essential steps for better data visualization. January 20, 2016
Some figures are clearly better in color
16
3 Essential steps for better data visualization. January 20, 2016
Some figures require color
3 Essential steps for better data visualization. January 20, 2016
Poorly designed multi-panel figure
What’s wrong with this multi-panel figure?
17
3 Essential steps for better data visualization. January 20, 2016
Better design
• Only need to show the legend once since it clearly applies to both panels• A graph title is optional (depends on reviewer); use the letters to identify the
panels in the figure legend• Mirror formatting, scale, colors, etc. between the panels when appropriate• Don’t have to repeat Y-axis title for panel B since it is the same measurement
Can I use trendy infographics??
18
3 Essential steps for better data visualization. January 20, 2016
What is an infographic?• A visual image such
as a chart or diagram used to represent information or data
• Highly shareable!
• Makes information accessible to non-experts
3 Essential steps for better data visualization. January 20, 2016
When to use infographic style?• Excellent for oral presentations, poster
presentations– Especially for coordinating complex
ideas presented in introduction or conclusion
• Excellent for public presentation– Infographics excel at distilling complex
data into a simple visual format
• Not generally appropriate for professional journals– Insufficient detail, often no statistics
shown
19
3 Essential steps for better data visualization. January 20, 2016
Use elements of the infographic style• Just as with regular figure, infographic
must be easily discernable within allowed the time frame for viewing
• Create new data charts using different design that focuses less on details
• Avoid excessive detail, as seen in some infographic examples
• You may want to work with professional graphic designer to achieve desired look and style
For more information on infographics, see http://www.slideshare.net/IQ_Agency/5-rulesinfographicsuccess
AIN93G TWD0
20
40
60
80
100
Inci
denc
e (%
)
* No supplementTart Cherry
Tart cherry supplemented dietfewer mice with
colon tumors, but only for those fed a healthy diet
40%
What software do you recommend??
20
3 Essential steps for better data visualization. January 20, 2016
Software options• Excel is just awful for making science graphs
– Especially bad for multi-panel figures– Thinks too much for you
• Try out other software that specializes in scientific data presentation– SigmaPlot http://www.sigmaplot.com/
• Haven’t worked with this one in years• Advanced graphics with somewhat steep learning curve
– GraphPad Prism http://www.graphpad.com/• Easy to use interface• Integrates statistics with graphing• Drawback – pricey for individuals ($100/yr); bulk licensing available• 30-day trial available
3 Essential steps for better data visualization. January 20, 2016
Primer on Prism• Organization of sheets by
folder structure– Data tables– Notes– Analyses– Charts– Layouts
• Each data table is linked to other “family” sheets
21
3 Essential steps for better data visualization. January 20, 2016
Advantages of Prism• Very customizable• Create your own
“template” and then use that for other charts
• Excellent for multi-panel figures
Insulin
AINTWD
DIO0.0
0.5
1.0
1.5
2.0
2.5
Con
cent
ratio
n (n
g/m
l pla
sma)
Leptin
AINTWD
DIO0.0
2.0
4.0
6.0
8.0 Resistin
AINTWD
DIO0
5
10
15
20Tumor Multiplicity
AIN93G TWD DIO VMM MM0.00
0.05
0.10
0.15
0.20
0.25
No.
tum
ors/
mm
col
on
a
b
cac
b
Tumor size
AIN93G TWD DIO VMM MM0
2
4
6
8
Tum
or v
olum
e (m
m3 )
a
b
c c
d
Gain in fat mass
0 2 4 6 8 10 12 14 160
10
20
30
40
Weeks
Fat m
ass
(% o
f BW
)
AIN93GTWDDIOMMVMM
c
bc
abaa
Oral Glucose Tolerance
0 15 30 45 60 75 90 105 120
100
200
300
400AIN93GTWDDIO
Time (min)
Glu
cose
(mg/
dL)
MMVMM
Final body weight
AIN TWD DIO MM VMM
20
25
30
35
40
Gra
ms ab
a
b
b
a
A B
D
C
G H I
Food intake
AIN TWD DIO MM VMM
200
250
300
350
400
Tota
l foo
d co
nsum
ed (g
)
a
bc
bc
Energy intake
AIN TWD DIO MM VMM0
1000
1100
1200
1300
1400
1500
Ener
gy c
onsu
med
(Kca
l)
ab a
c
b
a
Fasting Glucose
AIN TWD DIO MM VMM0
50
100
150
200
Glu
cose
(mg/
dL) a
ab
b
aba
E F
3 Essential steps for better data visualization. January 20, 2016
Advantages of Prism• Integrates basic statistics
– Students t-test– One-way and two-way
ANOVA– Histogram analyses– Contingency analysis
• Includes interpretation of results to help students understand analysis results
• Excellent for curve fitting
22
3 Essential steps for better data visualization. January 20, 2016
Other tools for data visualization• Create diagrams or simple infographics
using MS PowerPoint
• Create cartographic data using Tableau– Free public platform available for non-protected
data
• Work with digital image data using Adobe PhotoShop or other photo editing software*– Follow discipline-specific rules about editing
image-based data
Enrollment: College of Agriculture & Applied Sciences, Fall 2013
About Tableau maps: www.tableausoftware.com/mapdata
195 195
12
86 6
4
4
2
1 11 1
1
1
1
1
Countries
About Tableau maps: www.tableausoftware.com/mapdata
1
3
136
2
1
1
1
7
1
2
1
2
1
1
8
1
3
2
3
11
4
1
1
3
136
2
1
1
1
7
1
2
1
2
1
1
8
1
3
2
3
11
4
1
US States
About Tableau maps: www.tableausoftware.com/mapdata
10
10
14
5
3
6
6
2 2
21
1
1
1
1
1
Utah Counties
Comments and revisions of sample figures
23
3 Essential steps for better data visualization. January 20, 2016
Comments on sample figures• Patterns of lines difficult to discern
– Is it important to be able to distinguish these lines?
– Some colors are quite similar to each other. I see two different green dotted lines that are quite hard to distinguish
• Need a legend to define each of the traces
• Label “Mean N = 355” doesn’t make much sense in context of its location. Does one of the lines indicate the mean response?
• Labels need to be bold, much larger in size. This figure is likely to be reduced for publication.
3 Essential steps for better data visualization. January 20, 2016
Comments on sample figures• Not necessary to label each data
point. Crowds the graph.
• Mixture of colors is confusing. – Avoid red border around image,
especially on the light green background
– Use more contrast between colors or the different locations. Phoenix and Raleigh are quite similar.
• Place Y axis label along the axis, not at the top
• Not necessary to use minor ticks; makes figure look crowded.
24
3 Essential steps for better data visualization. January 20, 2016
Revised example
1 2 3 4 5 6 7 8 9 10 11 12
0
20
40
60
80
100
Month
Tem
pera
ture
(°F)
Average monthly temperature
Phoenix Raleigh Minneappolis
3 Essential steps for better data visualization. January 20, 2016
Comments on sample figure• The grid behind the data is unnecessary, crowds the
graph• If you remove the grid, then make sure the X and Y
axis have ticks at each of the number labels• Font size for the X and Y axes labels should be a bit
bigger• Use a sans serif font, such as Arial or Helvetica• Not entirely clear whether lines represent connections
between symbols or regression analysis• Why a rectangular plot? Scales seem very similar for
X and Y axes. Perhaps square plot is needed to show relationships between variables more precisely.• Symbols need to be easily distinguished. X and the
asterisk are too similar, especially right near each other.
25
3 Essential steps for better data visualization. January 20, 2016
Comments on sample figure
0 2 4 6 80
2
4
6
8
Differential Head over Levee (m)
Hea
d at
Toe
(m)
Cch1 (r2 = 0.994)Cch3 (r2 = 0.999)Cch4 (r2 = 0.996)Cch5 (r2 = 0.990)
I assumed connecting line was a regression in this example, and showed the r2 valueI also shortened the legend text, as the 1.47E-1 cm2/s can easily go into the figure legend text. But, if showing the figure as a presentation, the author may want to keep that info.
3 Essential steps for better data visualization. January 20, 2016
Contact for more information or questions
Abby D. BenninghoffDepartment of Animal, Dairy and Veterinary [email protected]