Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

A flexible two-column Jekyll theme. Perfect for personal sites, blogs, and portfolios hosted on GitHub or your own server. Latest release v4.9.1

Data Analysis

Use R for data analysis and visualization, handle geo-datasets, train models and estimate errors, and use GitHub for comprehensive documentation and task man...

Splash Page

Bacon ipsum dolor sit amet salami ham hock ham, hamburger corned beef short ribs kielbasa biltong t-bone drumstick tri-tip tail sirloin pork chop.

Posts

examples

unit01

First Things First

Go through a brute force introduction into R, R Markdown, the RStudio IDE, version management with Git and GitHub’s classroom functionality to get ready for ...

R and RStudio

To start with a clarification: R is the statistical programming language you will use in this course (and which is used by many other scientists). With R yo...

Example: Vector Basics

Vectors are the basis for many data types in R. Creating a vector A vector is created using the c function. Here are some examples: my_vector_1 <- c(1,2...

Example: List Basics

Lists are one of the most flexible data structures in R. Creation of a list A list is created with the list function. Here are some examples: my_list_1 &lt...

Example: Data Frame Basics

Data frames are one of the most heavily used data structures in R. Creation of a data frame A data frame is created from scratch by supplying vectors to the...

Example: Coercion

Data types Coercion of data elements into one of the basic R data types is straight forward. Just add “as.” to the data type to obtain the respective functi...

Example: Extracting Substrings

Extracting or replacing parts of a substring is quite straight forward but requires some more typing than e.g. in Python. The main function you will use is t...

Example: R Markdown with html output

This page shows how a compiled R markdown file looks like (in fact, all code examples in this course were compiled with R markdown). This is a header This ...

Git and GitHub

To start with a clarification: Git is the version control system you will use in this course (and which is used by many other developers all around the world...

Assignments and Working Environment

A note on individual learning log assignments with GitHub Within this course, you will individually submit your personal solutions for the course assignments...

unit02

First Things Second

Look closer at data sets and data types before focusing on the most important features of programming languages, namely run-time control and loop structures.

Operators

Control structures require logical or maybe also boolean operators which test simple relationships between two or more variables. Depending on the test resul...

Decisions and Loops

Decision structures are like junctions in the analysis workflow and decide which way to go during runtime. Loops are the workhorses for repeating the same co...

Example: If-then-else

If-then-else statements are the controlling structures in each program. The most simplest form is: a <- 5.0 b <- 10.0 if (a < b) { print("a is sm...

Example: For-loops

For loops are the mother of all repeating structures which enable the execution of certain code blocks for multiple times. For loops are usefull if the numb...

Example: Lapply-loops

As examples for the apply family we shortly introduce lapply. Basic structure The lapply function can be your workhorse when it comes to loops over data fr...

Example: While-loops

As for loops, while loops are repeating structures which enable the execution of certain code blocks for multiple times. The difference is that for while lo...

Code Styling

Using a consistent and intuitive coding style helps both you as the programmer and others as re-users of your source code. A comprehensive coding style helps...

Unmarked Assignment: Loop and Conquer

This worksheet provides some control structure and loop examples to help you getting familiar with these probably most important properties of any programmin...

unit03

Look at Your Data

Become familiar with reading and writing data, computing summary statistics and visual data exploration as the basics of data analysis.

Tabulated Data I/O

Reading or writing tabulated data into or from a data frame is a quite common task in data analysis. You could use the read.table function for this. df <-...

Visualization

Do not wait until the very final analysis stage to produce some publication quality graphics but produce fast (not necessarily nice) visualizations all the w...

Example: CSV I/O

Readading data from csv files Reading csv files is realized using the read.table function from R’s utils library. The function will return a data frame whic...

Example: Aggregation Statistics

Summarizing a data set The most straight forward function which returns some aggregated statistical information about a data set is summary. a <- c("A",...

Example: Visual Data Exploration

Visual data exploration should be one of the first steps in data analysis. In fact, it should start right after reading a data set. The following examples ar...

Marked Assignment: Read and Plot

This worksheet will guide you in getting a first overview of the wood harvest in Hessen between 1997 and 2014 using a visual data exploration. After completi...

unit04

Clean Your Data

Check the integrity of datasets and clean them up to ensure that the data basis for your analysis is consistent.

Example: Missing Values

Handling missing values is straight forward. Let’s start with a vector with one NA value at position 3. Please note that NA is not inside quotation marks sin...

Example: Date/Time

Coercing data types to date and/or time information is generally performed using as.Date or either as.POSIXct or as.POSIXlt. Let’s start with as.Date: as.Da...

Example: Sorting

Sorting vectors or lists Vectors can be sorted using the sort function. If you want to sort a list, you have to access the actual elements since sort require...

Example: Cleaning Columns

Cleaning data frames involves quite different aspects like splitting cell entries, converting data types or the conversion of “wide” to “long” format. In ge...

Example: Merging

When thinking about combining two data frames one has to distinguish between merging them by the values given in a specific column or consecutively putting t...

Unmarked Assignment: Cleaning Crops

This assignment is the first in a series which use regional statistical data. While the wood harvest data from Hessen was (i) quite small and (ii) quite tidy...

unit05

Describe your linear data

Compute simple statistical linear regression models that relate a dependent to an independent variable.

unit06

Predict your linear data

Compute simple linear models to predict dependent data and assess the performance by independent test samples.

Cross-validation

Test statistics can describe the quality or accuracy of regression models if the assumptions of the models are met. However, the assessment would still be b...

unit07

Select your variables

Evaluate the importance of your independent variables and select an optimal subset for your prediction model.

unit08

Tune your model

Evaluate model tuning strategies and find optimal settings for your prediction model.

Generalized additive models

So far, the models have only considered linear relationships. The corresponding model type to simple linear models would be an additive model and for poisson...

Unmarked Assignment: Model Tuning

This worksheet uses cross-validation strategies for tuning an additive model. After completing this worksheet you should have improved your skills on handli...

unit09

Predict Your Temporal Data

Look into some specific characteristics of time series data and predict future observations based on past dynamics.

Time Series

Although we already had contact with some temporal datasets, we did not have a closer formal look on time series analysis. Time series datasets often inhibi...

Predicting time series

Time-series analyses can generally be divided into forecasting future dynamics and describing and potentially explaining past patterns. Since the later ofte...

unit10

Time Series Decomposition

After looking into time-series forecasting, we will now switch to some basics of describing time series. To illustrate this, we will again use the (mean mont...

Time series clustering

Just as one last example on time series analysis for this module and mainly to demonstrate that this module only tiped a very small set of analysis concepts ...

Unmarked Assignment: NAO and Coelbe

This worksheet focus on the comparison of some meteorological time series data recorded at a station near Marburg University Forest with some global teleconn...

unit11

unit12

Publication Quality Graphics

Visualize your data, get some hints for publication quality graphics, and learn about some packages specifically made for visualizations.

Example: Colours

Before we expand our plotting capabilities, we want to spend a bit more time thinking about colours and colour spaces. A careful study of colour-spaces (e....

Example: Colours and maps

This is a short example on how to use the hcl colour palette for colouring features of a shapefile. Load the required packages library("rgdal") library("ras...

Example: Tics

The following plotting examples will revisit R’s generic plotting functions and pimp them up a little bit. The underlaying example data is taken from our dat...

Example: Wide and Long Format

The following is a short note on converting wide to long format required e.g. for some lattice or ggplot visualizations. The following examples are based on ...

Example: The R Graph Gallery

Finally, check out the R Graph Gallery for getting an impression of the many more data visualization possibilities in R.

worksheets