1 Data Handling

One thing that most people fail to acknowledge is that visualizing data in R (or any other programming language for that matter) usually involves a little more effort than simply calling some plot function to create a meaningful graph. Data visualization is in essence an abstract representation of raw data. Most plotting routines available in R, however, are not designed to provide any useful data abstraction. This means that it is up to us to prepare our data to a level of abstraction that is feasible for what we want to show with our visualization.

Therefore, before we start to produce plots, we will need to spend some time and effort to get familiar with some tools to manipulate our raw data sets. In particular, we will learn how to subset(), aggregate(), sort(), and merge() our data sets. For the sake of reproducibility, this workshop will make use of the diamonds data set (which comes with ggplot2) in all the provided examples.

### here's a rather handy way of loading all packages that you need
### for your code to work in one go
pkg <- c('ggplot2', 'latticeExtra', 'gridExtra', 'MASS', 
         'colorspace', 'plyr', 'Hmisc', 'scales')
jnk <- sapply(pkg, library, character.only = TRUE)

### load the diamonds data set (comes with ggplot2)
data(diamonds)

Right, enough of that introductory talk, let’s start getting our hands dirty!