To get information about the parameters in the dataset download the meta data. Inside this zip folder you will find a file called
Metadaten_Parameter_... .html with the relevant information.
In this tutorial we will use the following packages:
To download the data we first set the base url:
get folder structure
Files are stored in folders named after the year. To scrape the available years we use the rvest package:
This queries all the
a elements (links) of the page and extracts the
We get a vector with the follwing content:
Since the first element is a link to the parent folder, we exclude it:
To extract the month folders we use a similar approach. Let’s explain it first without a loop:
paste0(baseurl, year) we add the year to the url. With
str_subset we extract only those filenames that correspond to the station that we want to download.
The station id can be taken from this file.
Downloading all data
To start downloading data we first have to create a download folder:
With a nested loop we download the data for every month for every year.
- read several files at once and combine the data into a data frame.
- read directly from zip file without the need to extract the files first.
The options we use here are:
delim: set the delimiter to
na: set the value
trim_ws: trim whitespace around the data values
col_types: define column types as double (
d) or character (
We use the character column type to read the datetime columns, since the date time column format (
T) fails in this case. We use
as.POSIXct() with a custom format string to convert the datetime columns without errors.
We can save the data with the following command:
This way we can load it directly into our next data analysis project.