How would one go about performing a regression of an outcome `y` on an independent variable `sin(x-pi)`? Is it possible to do without a graphic calculator, or in a program such as Excel? Please...

How would one go about performing a regression of an outcome `y` on an independent variable `sin(x-pi)`?

Is it possible to do without a graphic calculator, or in a program such as Excel?

Please also mention what regression models are available when modelling the effect on an outcome `y` of a periodic variable `x`.

Asked on by laurfree

1 Answer | Add Yours

mathsworkmusic's profile pic

mathsworkmusic | (Level 2) Educator

Posted on

If you wish to carry out a linear regression then create a new variable

`x_1 = sin(x-pi)`

and carry out the regression of `y` on `x_1` in the usual way, ie fit

`y = alpha + betax_1`

using least squares estimation.

To predict `y` with the original variable `x`, use your estimated `alpha` and `beta` (`hat(alpha)`, `hat(beta)`) in the prediction equation

`y_(pred) = hat(alpha) + hat(beta)sin(x-pi)`

This will fit well if `y` and `x_1` have a linear relationship. If the relationship is something other than linear, or is simply random, then this will not fit well. Goodness of fit can be assessed by looking at the residuals `y - y_(pred)` and making sure there is no obvious pattern, or by comparing to an alternative model (eg fixed effects for 'seaons' in the period of interest - the four quarter periods for example), or quantile-quantile plots.

If you have a periodic variable `x` that results in a smoothly changing `y` over the period of interest, it is possible to use a mixture of sine and cosine terms, for example

`y = alpha + beta sin[a(x-b)] + gamma sin[c(x-d)]`

(Harmonic regression, but with the allowance of a general phase discrepancy). This allows your curve for `y` to consist of lower frequency and higher frequency components that may be out of phase. The parameters a,b,c,d wouldn't be estimable with the standard simple linear regression method but you may have an idea about these from your data. Having more than one `x` variable on the righthand side does mean that you would need to use multivariate regression which you may not have come across yet `( y = betaX` where `X` is a matrix?). If you have a lot of data, you could add more terms of the same type. A balance should be sought between the number of variables added and the goodness of fit (Occam's Razor - keep it as simple as is reasonable).

Potentially, any model can be fitted by optimisation (a computerised search over the n-dimensional space covered by all the possibilities for your n unknown coefficients). In practice, the method used to search that space isn't always successful. In particular, a local rather than a global maximum might be found. If the coefficients you read from the computer output don't give strange results, it is possible that the model you have chosen is in fact an inappropriate one that is fairly meaningless. The more coefficients you have and the less you know about what you might expect those coefficients to be, the more likely you will end up with a poor model. 

A program that you can use to optimise models is the R package. This is free and is maintained by the academic community that use it. The optimisation command that I use and have used is called 'optimx'. The R software is downloadable from the R Cran website - see link below. Simple linear regression and multivariate linear regression should be possible in Excel, but I very much doubt if harmonic regression for example is in that software.

Treat `sin(x-pi)` as a new variable `x_1` and fit a simple linear regression of `y` against `x_1` . If `y` and `x_1` do have a linear relationship, this model will fit well. If `x` is a periodic variable, there are more complex models that could be used involving cosine terms also, but these can't be fitted on a pocket calculator or in Excel.


We’ve answered 319,189 questions. We can answer yours, too.

Ask a question