Creating Publication Quality Graphics in R

3.3 Box-and-Whisker Plots (lattice)

Boxplots are probably the most useful visual way for showing the nature (or distribution) of your data and allow for some easy comparisons between different levels of a factor for example. See Wikimedia for a visual representation of the standard R settings of boxplots in relation to mean and standard deviation of a normal distribution.

So without further ado, here’s a basic lattice boxplot.

bw_lattice <- bwplot(price ~ color, data = diamonds)
bw_lattice

Figure 3.12: A basic boxplot produced with lattice.

Not so very beautiful… So, let’s again modify the standard par.settings so that we get an acceptable visual appearance of our boxplot. Much better, isn’t it?

bw_theme <- trellis.par.get()
bw_theme$box.dot$pch <- "|"
bw_theme$box.rectangle$col <- "black"
bw_theme$box.rectangle$lwd <- 2
bw_theme$box.rectangle$fill <- "grey90"
bw_theme$box.umbrella$lty <- 1
bw_theme$box.umbrella$col <- "black"
bw_theme$plot.symbol$col <- "grey40"
bw_theme$plot.symbol$pch <- "*"
bw_theme$plot.symbol$cex <- 2
bw_theme$strip.background$col <- "grey80"

l_bw <- update(bw_lattice, par.settings = bw_theme)

print(l_bw)

Figure 3.13: A lattice boxplot with modified graphical parameter settings.

bw_lattice <- bwplot(price ~ color | cut, data = diamonds,
                     asp = 1, as.table = TRUE, varwidth = TRUE)
l_bw <- update(bw_lattice, par.settings = bw_theme, xlab = "color", 
               fill = clrs_hcl(7),
               xscale.components = xscale.components.subticks,
               yscale.components = yscale.components.subticks)

print(l_bw)

Figure 3.14: A lattice panel boxplot with colored boxes and box widths relative to the number of observations.

In addition to the rather obvious provision of a color palette to fill the boxes, in this final boxplot we have also told lattice to adjust the widths of the boxes so that they reflect the relative sizes of the data samples for each of the factors (colors). This is a rather handy way of providing insight into the data distribution along the factor of the x-axis. We can show this without having to provide any additional plot to highlight that some of the factor levels (i.e. colors) are much less represented than others (‘J’ compared to ‘G’, for example, especially for the ‘Ideal’ quality class).