Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
A flexible two-column Jekyll theme. Perfect for personal sites, blogs, and portfolios hosted on GitHub or your own server. Latest release v4.9.1
Data Analysis
Use R for data analysis and visualization, handle geo-datasets, train models and estimate errors, and use GitHub for comprehensive documentation and task man...
Splash Page
Bacon ipsum dolor sit amet salami ham hock ham, hamburger corned beef short ribs kielbasa biltong t-bone drumstick tri-tip tail sirloin pork chop.
Posts
examples
unit01
First Things First
Go through a brute force introduction into R, R Markdown, the RStudio IDE, version management with Git and GitHub’s classroom functionality to get ready for ...
R and RStudio
To start with a clarification: R is the statistical programming language you will use in this course (and which is used by many other scientists). With R yo...
Example: Vector Basics
Vectors are the basis for many data types in R. Creating a vector A vector is created using the c function. Here are some examples: my_vector_1 <- c(1,2...
Example: List Basics
Lists are one of the most flexible data structures in R. Creation of a list A list is created with the list function. Here are some examples: my_list_1 <...
Example: Data Frame Basics
Data frames are one of the most heavily used data structures in R. Creation of a data frame A data frame is created from scratch by supplying vectors to the...
Example: Coercion
Data types Coercion of data elements into one of the basic R data types is straight forward. Just add “as.” to the data type to obtain the respective functi...
Example: Extracting Substrings
Extracting or replacing parts of a substring is quite straight forward but requires some more typing than e.g. in Python. The main function you will use is t...
Example: R Markdown with html output
This page shows how a compiled R markdown file looks like (in fact, all code examples in this course were compiled with R markdown). This is a header This ...
Git and GitHub
To start with a clarification: Git is the version control system you will use in this course (and which is used by many other developers all around the world...
Assignments and Working Environment
A note on individual learning log assignments with GitHub Within this course, you will individually submit your personal solutions for the course assignments...
Marked Assignment: Hello R, Hello GitHub
This worksheet introduces you to R, R scripts and R markdown. Your submission will be pushed to your class repository at GitHub. After completion you should ...
unit02
First Things Second
Look closer at data sets and data types before focusing on the most important features of programming languages, namely run-time control and loop structures.
Operators
Control structures require logical or maybe also boolean operators which test simple relationships between two or more variables. Depending on the test resul...
Decisions and Loops
Decision structures are like junctions in the analysis workflow and decide which way to go during runtime. Loops are the workhorses for repeating the same co...
Example: If-then-else
If-then-else statements are the controlling structures in each program. The most simplest form is: a <- 5.0 b <- 10.0 if (a < b) { print("a is sm...
Example: For-loops
For loops are the mother of all repeating structures which enable the execution of certain code blocks for multiple times. For loops are usefull if the numb...
Example: Lapply-loops
As examples for the apply family we shortly introduce lapply. Basic structure The lapply function can be your workhorse when it comes to loops over data fr...
Example: While-loops
As for loops, while loops are repeating structures which enable the execution of certain code blocks for multiple times. The difference is that for while lo...
Code Styling
Using a consistent and intuitive coding style helps both you as the programmer and others as re-users of your source code. A comprehensive coding style helps...
Unmarked Assignment: Loop and Conquer
This worksheet provides some control structure and loop examples to help you getting familiar with these probably most important properties of any programmin...
unit03
Look at Your Data
Become familiar with reading and writing data, computing summary statistics and visual data exploration as the basics of data analysis.
Tabulated Data I/O
Reading or writing tabulated data into or from a data frame is a quite common task in data analysis. You could use the read.table function for this. df <-...
Visualization
Do not wait until the very final analysis stage to produce some publication quality graphics but produce fast (not necessarily nice) visualizations all the w...
Example: CSV I/O
Readading data from csv files Reading csv files is realized using the read.table function from R’s utils library. The function will return a data frame whic...
Example: Aggregation Statistics
Summarizing a data set The most straight forward function which returns some aggregated statistical information about a data set is summary. a <- c("A",...
Example: Visual Data Exploration
Visual data exploration should be one of the first steps in data analysis. In fact, it should start right after reading a data set. The following examples ar...
Marked Assignment: Read and Plot
This worksheet will guide you in getting a first overview of the wood harvest in Hessen between 1997 and 2014 using a visual data exploration. After completi...
unit04
Clean Your Data
Check the integrity of datasets and clean them up to ensure that the data basis for your analysis is consistent.
Example: Missing Values
Handling missing values is straight forward. Let’s start with a vector with one NA value at position 3. Please note that NA is not inside quotation marks sin...
Example: Date/Time
Coercing data types to date and/or time information is generally performed using as.Date or either as.POSIXct or as.POSIXlt. Let’s start with as.Date: as.Da...
Example: Sorting
Sorting vectors or lists Vectors can be sorted using the sort function. If you want to sort a list, you have to access the actual elements since sort require...
Example: Cleaning Columns
Cleaning data frames involves quite different aspects like splitting cell entries, converting data types or the conversion of “wide” to “long” format. In ge...
Example: Merging
When thinking about combining two data frames one has to distinguish between merging them by the values given in a specific column or consecutively putting t...
Unmarked Assignment: Cleaning Crops
This assignment is the first in a series which use regional statistical data. While the wood harvest data from Hessen was (i) quite small and (ii) quite tidy...
unit05
Describe your linear data
Compute simple statistical linear regression models that relate a dependent to an independent variable.
Example: Simple Bivariate Linear Regression
Linear regression modelling is one of the more common tasks in data analysis and the following example will cover the very basic topic of bivariate linear re...
Marked Assignment: Recreation vs. Settlement
This worksheet tackles the question, how the percentage share of settlement area is related to the share of recreation area in each community. After complet...
unit06
Predict your linear data
Compute simple linear models to predict dependent data and assess the performance by independent test samples.
Cross-validation
Test statistics can describe the quality or accuracy of regression models if the assumptions of the models are met. However, the assessment would still be b...
Unmarked Assignment: Recreation vs. Settlement revisited
This worksheet revisits the settlement vs. recreation model and compares to which degree the results describing the performance of the model differ between t...
unit07
Select your variables
Evaluate the importance of your independent variables and select an optimal subset for your prediction model.
Feature selection in multiple variable models
So far, the models have only considered one explanatory (i.e. independent) variable. If another variable should be explained or predicted by more than one va...
Marked Assignment: Wheat vs. everything else
This worksheet uses the crop dataset cleaned previously to extend the prediction of winter wheat to multiple variables using a forward feature selection appr...
unit08
Tune your model
Evaluate model tuning strategies and find optimal settings for your prediction model.
Generalized additive models
So far, the models have only considered linear relationships. The corresponding model type to simple linear models would be an additive model and for poisson...
Unmarked Assignment: Model Tuning
This worksheet uses cross-validation strategies for tuning an additive model. After completing this worksheet you should have improved your skills on handli...
unit09
Predict Your Temporal Data
Look into some specific characteristics of time series data and predict future observations based on past dynamics.
Time Series
Although we already had contact with some temporal datasets, we did not have a closer formal look on time series analysis. Time series datasets often inhibi...
Predicting time series
Time-series analyses can generally be divided into forecasting future dynamics and describing and potentially explaining past patterns. Since the later ofte...
Unmarked Assignment: Precipitation Forecast
This worksheet introduces you to ARIMA modeling using a precipitation time series recorded at a station near Marburg University Forest. After completing thi...
unit10
Analyse Your Temporal Data
Analyse your time series data and decompose it into seasonal characteristics and long-term trends.
Time Series Decomposition
After looking into time-series forecasting, we will now switch to some basics of describing time series. To illustrate this, we will again use the (mean mont...
Time series clustering
Just as one last example on time series analysis for this module and mainly to demonstrate that this module only tiped a very small set of analysis concepts ...
Unmarked Assignment: NAO and Coelbe
This worksheet focus on the comparison of some meteorological time series data recorded at a station near Marburg University Forest with some global teleconn...
unit11
Marburg Open Hackathon
Follow the link to start the Marburg Open Hackathon (MOHA)
unit12
Publication Quality Graphics
Visualize your data, get some hints for publication quality graphics, and learn about some packages specifically made for visualizations.
Example: Colours
Before we expand our plotting capabilities, we want to spend a bit more time thinking about colours and colour spaces. A careful study of colour-spaces (e....
Example: Colours and maps
This is a short example on how to use the hcl colour palette for colouring features of a shapefile. Load the required packages library("rgdal") library("ras...
Example: Tics
The following plotting examples will revisit R’s generic plotting functions and pimp them up a little bit. The underlaying example data is taken from our dat...
Example: Wide and Long Format
The following is a short note on converting wide to long format required e.g. for some lattice or ggplot visualizations. The following examples are based on ...
Example: The R Graph Gallery
Finally, check out the R Graph Gallery for getting an impression of the many more data visualization possibilities in R.