3.3 Box-and-Whisker Plots (lattice)
Boxplots are probably the most useful visual way for showing the nature (or distribution) of your data and allow for some easy comparisons between different levels of a factor for example. See Wikimedia for a visual representation of the standard R settings of boxplots in relation to mean and standard deviation of a normal distribution.
So without further ado, here’s a basic lattice boxplot.
bw_lattice <- bwplot(price ~ color, data = diamonds)
bw_lattice
Not so very beautiful… So, let’s again modify the standard par.settings
so that we get an acceptable visual appearance of our boxplot. Much better, isn’t it?
bw_theme <- trellis.par.get()
bw_theme$box.dot$pch <- "|"
bw_theme$box.rectangle$col <- "black"
bw_theme$box.rectangle$lwd <- 2
bw_theme$box.rectangle$fill <- "grey90"
bw_theme$box.umbrella$lty <- 1
bw_theme$box.umbrella$col <- "black"
bw_theme$plot.symbol$col <- "grey40"
bw_theme$plot.symbol$pch <- "*"
bw_theme$plot.symbol$cex <- 2
bw_theme$strip.background$col <- "grey80"
l_bw <- update(bw_lattice, par.settings = bw_theme)
print(l_bw)
bw_lattice <- bwplot(price ~ color | cut, data = diamonds,
asp = 1, as.table = TRUE, varwidth = TRUE)
l_bw <- update(bw_lattice, par.settings = bw_theme, xlab = "color",
fill = clrs_hcl(7),
xscale.components = xscale.components.subticks,
yscale.components = yscale.components.subticks)
print(l_bw)
In addition to the rather obvious provision of a color palette to fill the boxes, in this final boxplot we have also told lattice to adjust the widths of the boxes so that they reflect the relative sizes of the data samples for each of the factors (colors). This is a rather handy way of providing insight into the data distribution along the factor of the x-axis. We can show this without having to provide any additional plot to highlight that some of the factor levels (i.e. colors) are much less represented than others (‘J’ compared to ‘G’, for example, especially for the ‘Ideal’ quality class).