2 A Brief Note on using Colors

Before we start plotting our data, we need to spend some time to have a closer look at color representation of certain variables. A careful study of color spaces (e.g. Zeileis, Hornik, and Murrell (2009), HCLwizard, Aisch (2011) or Wikipedia) leads to the conclusion that the HCL color space is preferable when mapping a variable to color (be it factorial or continuous).

This color space is readily available in R through the package colorspace and the function of interest is called hcl().

In the code chunk that follows we will create a color palette that varies in both color (hue) and also luminance (brightness) so that this can be distinguished even in grey-scale (printed) versions of the plot. As a reference color palette we will use the ‘Spectral’ palette from ColorBrewer which is also a multi-hue palette but represents a diverging color palette. Hence, each end of the palette will look rather similar when converted to greyscale.

library(RColorBrewer)

clrs_spec <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
clrs_hcl <- function(n) {
  hcl(h = seq(230, 0, length.out = n), 
      c = 60, l = seq(10, 90, length.out = n), 
      fixup = TRUE)
  }

### function to plot a color palette
pal <- function(col, border = "transparent", ...)
{
 n <- length(col)
 plot(0, 0, type="n", xlim = c(0, 1), ylim = c(0, 1),
      axes = FALSE, xlab = "", ylab = "", ...)
 rect(0:(n-1)/n, 0, 1:n/n, 1, col = col, border = border)
}

So here’s the Spectral palette from ColorBrewer interpolated over 100 colors:

pal(clrs_spec(100))
A diverging rainbow color palette from [ColorBrewer](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3).

Figure 2.1: A diverging rainbow color palette from ColorBrewer.

And this is what it looks like in greyscale:

pal(desaturate(clrs_spec(100)))
Same palette as above in grey-scale.

Figure 2.2: Same palette as above in grey-scale.

We see that this palette varies in lightness from a very bright center to darker ends on each side. Note, that the red end is slightly darker than the blue end.

This is quite ok in case we want to show some diverging data (deviations from a zero point, for example the mean). However, if we are dealing with a sequential measure, such as temperature, or in our case the density of points plotted per some grid cell, we really need to use a sequential color palette. There are two common problems with sequential palettes:

  1. We need to create a palette that maps the data accurately. This means that the perceived distances between the different hues utilized need to reflect the distances between our data points. AND this distance needs to be constant, no matter between which two point of the palette we want to estimate their distance. Let us consider the following example showing a classic ‘rainbow’ palette (MATLAB refers to this as ‘jet colors’):
pal(rainbow(100))
The classic rainbow color palette.

Figure 2.3: The classic rainbow color palette.

It becomes obvious that there are several thin bands in this color palette (yellow, aquamarine, purple) which do not map the distances between variable values accurately. That is, the distance between two values located in or around the yellow region of the palette will seem to change faster than, for example, somewhere in the green region (and red and blue).

When converted to greyscale, this palette produces a hell of a mess:

pal(desaturate(rainbow(100)))
Same palette as above in grey-scale.

Figure 2.4: Same palette as above in grey-scale.

Note, that this palette is maybe the most widely used color coding palette for mapping a sequential continuous variable to color. We will see further examples later on… We hope you get the idea that this is not a good way of mapping a sequential variable to color!

hcl() produces so-called perceptually uniform color palettes and is therefore much better suited to represent sequential data:

pal(clrs_hcl(100))
An HCL based multi-hue color palette with increasing luminance towards the red end.

Figure 2.5: An HCL based multi-hue color palette with increasing luminance towards the red end.

We see that the different hues are spread constantly over the palette and therefore it is easy to estimate distances between data values from the changes in color. The fact that we also vary luminance here means that we get a very light color at one end (in our case, the red end which is more of a pink tone than red). This might, at first sight, not seem very aesthetically pleasing, yet it enables us to encode our data even in greyscale:

pal(desaturate(clrs_hcl(100)))
Same palette as above in grey-scale.

Figure 2.6: Same palette as above in grey-scale.

As a general suggestion, we encourage you to make use of the HCL color space whenever you can. But most importantly, it is essential to do some thinking before mapping data to color. The most basic question you always need to ask yourself is probably

What is the nature of the data that I want to show? More precisely, is it sequential, diverging, or qualitative?

Once you know this, it is easy to choose the appropriate color palette for the mapping. A good place to start for choosing palettes that are perceptually well thought through is ColorBrewer.

References

Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2009. “Escaping RGBland: Selecting Colors for Statistical Graphics.” Computational Statistics and Data Analysis 53: 3259–70. doi:10.1016/j.csda.2008.11.033.