Automated setup of working environment

Setting up a working or project environment requires the definition of different folder paths and the loading of necessary R packages and additional functions. If, in addition, external APIs (application programming interface) of QGIS, SAGA, GRASS Orfeo Toolbox (to name the most important) are to be integrated stably and without great effort, the associated paths and environment variables must also be defined correctly.

Basic idea

There are several R-packages like e.g. workflowR or usethis which provide a wide range of functions for such issues. For the entry into a structured organization of R-based development projects, we suggest a slimmed down version.

Essentially four categories of tasks are to be served:

  • Organization of data
  • Organization of scripts
  • Organization of documentation
  • Organization of environment variables for external programs

The basis of the aforementioned categories is an adequate storage structure on a suitable permanent storage medium (hard disk, USB stick, cloud, etc.). We suggest a meaningful hierarchical directory structure. The root folder of a project is the basis of an organizational structure branched below.

Defining folders manually

In the following, the folders are defined first as examples.

# define a project rootfolder
rootDir = "~/edu/mpg-envinsys-plygrnd"     # This is the rootfolder of the whole project 

# Set project specific subfolders
projectDirList   = c("data/",                # data folders the following are obligatory but you may add more
                    "data/auxdata/",  
                    "data/aerial/org/",
                    "data/lidar/org/",
                    "data/lidar/",
                    "data/grass/",
                    "data/lidar/level1/",
                    "data/lidar/level2/",
                    "data/lidar/level0/",
                    "data/data_mof", 
                    "data/tmp/",
                    "run/",                # folder for runtime data storage
                    "log/",                # logging
                    "src/",                # source code
                    "doc/",                # documentation  
                    "name_of_github_team_repository/src/",   # source code github
                    "name_of_github_team_repository/doc/")   # markdown etc.  github

                    

Introduction of the envimaR helper package

It would now be convenient if these folders defined as lists were automatically created and read in. For the needs of the course we have written a small project management package called envimaR that takes over these tasks. It is located on github and can be installed as known.

devtools::install_github("envima/envimaR")

First I want to find out which folder structure can be used sensibly on my system. So the use of the so called H: drive on the pool PCs is extremely problematic due to the underlying dfs// network assignment and therefore to be avoided. For an automatic query on which computer I am currently working (and therefore which root directory I want to use) use the function envimaR::alternativeEnvi.

require(envimaR)
envimaR::alternativeEnvi(root_folder = rootDir,              # if exist this is the root dir 
                         alt_env_id = "COMPUTERNAME",        # check the environment varialbe "COMPUTERNAME"
                         alt_env_value = "PCRZP",            # if it contains the string "PCRZP" (e.g. local PC-Pools)
                         alt_env_root_folder = "F:/BEN/edu") # use the alternative rootfolder

Provided I want to create a project with the obligatory folder structure defined above, checking the PC I am working on, load all packages I need and store all environment variables in a list for latter use I may use the createEnvi function. To do so I first have to define a list of all packages that I want to load.

# list of packages to load
packagesToLoad = c("lidR", "link2GI", "mapview", "raster", "rgdal", "rlas", "sp", "uavRst")

# Automatically set root direcory, folder structure and load libraries
envrmt = envimaR::createEnvi(root_folder = rootDir,
                             folders = projectDirList,
                             path_prefix = "path_",              # prefix to all path variables that are created 
                             libs = packagesToLoad,                        # list of R-packages that should be loaded
                             alt_env_id = "COMPUTERNAME",        # check the environment varialbe "COMPUTERNAME"
                             alt_env_value = "PCRZP",            # if it contains the string "PCRZP" (e.g. local PC-Pools)
                             alt_env_root_folder = "F:/BEN/edu") # use the alternative rootfolder
                         

I will receive something like the following messages. Note even if red colored these are no error messages…

Loading required package: lidR
Loading required package: raster
Loading required package: sp
lidR 2.1.2 using 2 threads. Help on <gis.stackexchange.com>. Bug report on <github.com/Jean-Romain/lidR>.
Loading required package: link2GI
Loading required package: mapview
Loading required package: rgdal
rgdal: version: 1.4-7, (SVN revision 845)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 3.0.1, released 2019/06/28
 Path to GDAL shared files: 
 GDAL binary built with GEOS: TRUE 
 Loaded PROJ.4 runtime: Rel. 6.2.0, September 1st, 2019, [PJ_VERSION: 620]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.3-1 
Loading required package: rlas
Loading required package: uavRst

Wrap it up in a setup script

Finally, some useful settings have to be made. So it makes sense to have the current github versions of the non CRAN packages available and for the raster package you should also set an option for temporary actions.

If you put everything together in one script it looks like this:

### mpg course basic setup
# install/check from github
devtools::install_github("envima/envimaR")
devtools::install_github("gisma/uavRst")
devtools::install_github("r-spatial/link2GI")

packagesToLoad = c("lidR", "link2GI", "mapview", "raster", "rgdal", "rlas", "sp", "uavRst", "sf")

# Source setup script
require(envimaR)
rootDir = envimaR::alternativeEnvi(root_folder = "~/edu/mpg-envinsys-plygrnd",
                                       alt_env_id = "COMPUTERNAME",
                                       alt_env_value = "PCRZP",
                                       alt_env_root_folder = "F:/BEN/edu")


# Set project specific subfolders
projectDirList   = c("data/",                # data folders the following are obligatory but you may add more
                    "data/auxdata/",  
                    "data/aerial/org/",
                    "data/lidar/org/",
                    "data/lidar/",
                    "data/grass/",
                    "data/lidar/level1/",
                    "data/lidar/level2/",
                    "data/lidar/level0/",
                    "data/data_mof", 
                    "data/tmp/",
                    "run/",                # folder for runtime data storage
                    "log/",                # logging
                    "src/",                # source code
                    "doc/")                # documentation markdown etc.

# Automatically set root direcory, folder structure and load libraries
envrmt = envimaR::createEnvi(root_folder = rootDir,
                             folders = projectDirList,
                             path_prefix = "path_",
                             libs = packagesToLoad,
                             alt_env_id = "COMPUTERNAME",
                             alt_env_value = "PCRZP",
                             alt_env_root_folder = "F:/BEN/edu")
## set raster temp path
raster::rasterOptions(tmpdir = envrmt$path_tmp)

Please check the result by navigating to the directory using your favorite file manger. In addition please check the returned list. It contains all paths as character strings in a convenient list structure

# traditionally
str(envrmt)

# more fancy
require(listviewer)
listviewer::jsonedit(envrmt)  

Concluding remarks and considerations

It is very useful to save this script in the src folder (e.g. under mpg_course_basic_setup.R) and source it before every start of an analysis script connected with this project, i.e. read in:

source(file.path(envimaR::alternativeEnvi(root_folder = "~/edu/mpg-envinsys-plygrnd",
                                       alt_env_id = "COMPUTERNAME",
                                       alt_env_value = "PCRZP",
                                       alt_env_root_folder = "F:/BEN/edu"),
                  "src/mpg_course_basic_setup.R"))

The script thus available provides as intended:

  • a folder structure for the needed data
  • folder structure for scripts
  • a list variable containing all paths
  • folder for documentation

What still is missing is the organization of environment variables for external programs. But we will soon integrate it.

Updated: