Back on July 10th I listened to a very informative webinar, Communicating through infographics. Hosted by SLA’s DC Chapter and presented by Duke University’s Dr Christa Kelleher, the session provided an overview of recommendations and potential pitfalls for creating infographics. Looking back over my notes all these weeks later they seem rather stream-of-consciousness, so I’m almost reluctant to write them up in case I make the webinar sound a lot less coherent than it actually was! However, I’m moving house soon and I don’t want to risk losing them (they make sense to me!), so here goes…
for data analysis = Matlab, R, Python
for spatial analysis = ArcGIS, R Mapping
for fine tuning = Illustrator, Inkscape
How to create effective visualisations:
1. Choose an effective plot for your data & message (Steven Few), eg. line graph for data, heat map for positional info.
Dot plots are cleaner than bar charts for lots of different entities in the same graphic.
Pie charts are considered ineffective for 4 or more entities – consider a % table instead.
2. Remove ‘chart junk’ (Tufte, the “grandfather of visualisation”). Stay away from redundant, superfluous or non-data information, and avoid 3D graphics (except for in a few very specific cases). Take out excess lines, do all you can to make things cleaner. Don’t add unnecessary colour, use sparingly to highlight key details.
3. Display the same number of dimensions as the dataset – this makes it easy to identify differing attributes to the norm.
4. Consider the use of colour (or greyscale):
for sequential data keep to the same colour sequence
for diverging data depict the mean in a middle colour and have the others diverge outwards
for categorical data use random colours which are easily distinguished
colorbrewer2.org is useful, and there’s also an R plugin for this
5. Maintain axes when comparing subplots – may need to be manually changed (eg. don’t have one set of data grouped in 10s and the other in 100s, use same scale for both).
What people look for in a visualisation, and good practice for different formats:
Bar charts = magnitude
line plots = change
dot plots = correlation
frequency plots = distribution
Reference to zero (y axis) (Nathan Yan)
Rotate if more than 8-10 categories (so they run vertically down the left hand side)
Include a legend
Be aware of scaling (Gary Klass)
Stacked bar charts make sense if you’re comparing whatever’s on the bottom BUT make sure both sets of data are using the same scale (switch to % if numbers are unworkable)
Consider the aspect ratio (William Cleveland) – eg. the ratio of width to height, 45% is optimal
But… a smaller aspect ratio allows you to show long-term trends, whilst a higher ratio highlights short-term variations
Use a log scale for y axis if comparing 2 significantly different data sets, or use a different y axis for each (eg. one on the left side & the other on the right)
Horizon graphs can be good for compressing data into a small space
Use density, make points transparent
Matrices highlight relationships across data, colour can be used to plot 3rd dimension
Combination of points and line graph – eg. monthly changes over a period of several years – used to show values and trends through time.
Allow a lot of info to be displayed in a small space, as they show
– overall distribution
– skew (difference between quartiles and median)
Consider the number of bins as these are very sensitive to both number and density.
Use kernel density estimators to achieve a smooth plot
A flow diagram where arrow width is proportional to the quantity of material flow
Originally for economics data
Graphics in multiple dimensions, more for interaction than presentation
Other recommended resources (some from Christa, some by webinar participants):
Tableau: http://www.tableausoftware.com, free public version (Duke library has libguide)
rforcats.net = help
Google public data
Hans Rossling (TED talk)
Leave a Comment so far
Leave a comment