# Fit periodic sine model

In this post I will demonstrate that a periodic signal can be fitted with a linear model using a sine and cosine transform on the input signal.

We will be using the following packages:

We will be using data from the German weather service (DWD) which we can download in the following way:

The data is in csv format and can be read with `read_delim`

from the readr package:

We add the `yday`

column by using the `yday`

function of the lubridate package on `MESS_DATUM`

. We then transform yday to radians of the 365 day yearly cycle:

This way we can use it directly in the sin and cos functions we will be using later.

Let’s dive right in by calculating the first linear model. Yo may be familiar to using the log transform on the input variable to fit log distributed data. You also can use the cos or sin functions to transform the input variable. Since the input signal also can be phase shifted we have to use a combination of sin and cos. Hence we define the model in the following way:

We add the predicted values and residuals to the original data frame:

This way we can add the prediction to a scatterplot of the data:

You can see that we have a pretty good fit. Let’s look at the residuals to verify that:

I added a `geom_smooth()`

to the plot which fits a loess function to the data. This helps to identify remaining variation in the mean of the residuals. You can see a slight wobble in the loess which indicates that there is a weak second frequency in the data. But I consider it too weak to take into account.

For now we will ignore that the distribution around the mean is not constant. There are other methods of finding residual oscillations in the data like the ACF function. We will not deal with that here.

# calculate phase shift

An arbitrary cosine signal can be described in two ways:

\[A \cdot cos(\omega t + \varphi) = B \cdot cos(\omega t) + C \cdot sin(\omega t)\]where on the left side we have \(A\), which is is the amplitude of the oscillation and \(\varphi\), which is the phase shift. \(\omega\) is the angular frequency.

On the right side we have the factors \(B\) and \(C\) which correspond to \(A\) in the following way:

\[A = \sqrt{B^2+C^2}\]\(\varphi\) is related as follows:

\[\varphi=atan\left(\frac{B}{C}\right)\]R has an `atan2()`

function which takes into account the signs of both arguments to determine the correct quadrant of the resulting angle.

This way we can extract the following metrics:

The phase angle is transformed back to days in the 365 day cycle.

Let’s show the phase angle in a plot:

The orange line shows the minimum of the sine curve.

# fit sine models with additional subfrequencies

Not every signal can be described by a single sine wave. We can easily add additional subfrequencies to the linear model.

To demonstrate this we use solar radiation data from the same station. This can be downloaded as followed:

We read the data the same way as the temperature data:

We filter data at solar noon (MESS_DATUM_WOZ == 12). WOZ is the real local time (Wahre Ortszeit in German).

We extract the max and mean for every day of the yearly cycle for all years in the data set:

The syntax is from the dplyr package. We use `group_by()`

to group the data by yday and then use the `summarise()`

function to calculate the mean and max of the radiation data. Finally we get rid of observations with missing data using the `na.omit`

function.

To fit a sine model to the data we use the same syntax as for the temperature model:

The next plot shows all data in the background and the maximum for every day in the year highlighted as solid dots. We can see that the fitted sine wave does not describe the data perfectly.

Note that there are wavy patterns in the scatterplot that probably derive from the measuerement procedure. We will not deal with that.

If we plot the residuals we can clearly see that they contain an additional frequency of two times the yearly cycle.

To improve the fitting we can introduce additional subfrequencies into the model:

The anova (analysis of variance) table shows us that `sin(2*yday_rad)`

and `cos(2*yday_rad)`

frequencies are calculated as significant, while the `sin(3*yday_rad)`

and `cos(3*yday_rad)`

are not:

We therefore could remove the third frequency from the model.

Let’s fit the same model to the mean radiation that we calculated earlier:

Since subfrequencies can have different phase shift to the main frequency we calculate the maximum of the model as a substitute for the overall phase shift. We will be using that later.

A look into the anova table shows us that here the third frequency is also significant:

To show all data and fitted models together we plot the following:

Red dots and the orange curve are the mean radiation data. We can see that both models fit the data quite good.

Again let’s look at the residuals:

By checking the overlayed loess function we see that there is no significant residual frequency in the residuals. The same is true if the plot the residuals of the mean model.

# compare phase shift

To compare the phase shift between the radiation data model and the temperature model we plot both functions into the same plot:

We can see that the maximum of the mean radiation fitting is a bit later than the maximum radiation. The maximum temperature is a good 36 days behind the maximum of radiation. This is only logical since radiation is the driver of the atmospheric warming and the whole landmass/water body/atmosphere system has some heat capacity to buffer the heating process.

We can visualize the lag in the radiation and temperature curve in an interesting way. For that we make predictions of all our models for a single yearly cycle:

The plot is then generated as followed:

The plot shows the hysteresis between solar radiation and air temperature. Higher radiation is not directly translated into higher air temperature. There is a lag between the signals. We see that the cooling off phase in autumn is more direct as the warming in spring. The curve is more straight from August to December and takes a bit of a detour between January and July.

Here we have the same picture for the mean radiation:

Note that in this type of plot a perfect 90 degree phase shift between the two signals would create a perfect circle. A total sync of the phase would create a line (100% correlation between the signals). Every phase shift in between creates an oval. Since one of our signals is combined from 3 different frequencies the oval get’s flattened on one side.