Polynomial Regression - Examples

The purpose of this example is to demonstrate that linear regression will not work even in the simplest of cases. We will use the residual plot of the simple linear regression to help us expand the model into a polynomial model.

This example covers two cases of polynomial regression. The first data set forms an exact parabolic curve, and then the second data set forms an approximate parabolic curve. We will no longer look at manual computations. However, if you want to carry out the parameter estimation manually, or use e.g. Microsoft Excel spreadsheet to help in the computations, the required formulae are given in the section Polynomial Regression Computational Steps.

The cases are developed first using the Microsoft Excel Trendline Function, and then running the Microsoft Excel Regression Analysis Tool on the data. The regression analysis is performed only on the second data set. The objective is to demonstrate

Note: Please keep in mind that all statements made here with respect to polynomial regression are also valid with other regression models.

Case 1: Perfect Polynomial Regression

For this case the data values were generated using a parabolic function y = -0.01x2 + 0.5x + 0.05. Just for the fun a simple linear regression model was first developed. Clearly, the model does not fit. I know, we didn't expect it to fit. How could we fit a line to a curve successfully any way!!! Next, we want to test if the Trendline function (choosing the second degree polynomial regression) of Microsoft Excel could regenerate the original parabolic function. As you see from the animation the polynomial regression results exactly into the same function (with R2=1). Please recall that regression modeling will place the 'line' so that the sum of squares of errors (SSE) is minimized.

Case 2: Errors Introduced - Polynomial Regression

In this case the data pattern contains errors, but follows somewhat closely the parabolic pattern of the previous example.

First, a simple linear regression model is developed. As you can see from the animation, simple linear regression suggests that there is no regression. The slope coefficient of the line is zero and R2=0. With these kinds of results in practical situations, you may conclude that there is no regression.

If you, however, study the scatter plots of variable pairs, you may identify patterns. Such patterns may call for variable transformations or other types of models. Because the above scatter plot shows a strong parabolic pattern, we wanted to attempt to fit a second degree polynomial to the data using the Trendline function of Microsoft Excel. Visual analysis of the fitted polynomial confirms that the parabola appears to fit very well. How about multivariate cases, what can we do!?! The answer is, that most commonly we use the residual plots.

Note: In multivariate cases we visually analyze the residual plots, and make model improvements with a goal that an ideal residual plot should display a random horizontal pattern of points of equal width. Please recall that OLS calls for normally distributed errors with mean zero (dots are about equally above and below zero) and constant variance (dots form a horizontal band of equal width), and that the errors are independent (dots are randomly distributed and do not form distinct patterns).

The above animation of the residual plots shows the two situations for the second data set: the residual plot for the simple linear regression model, and the residual plot for the polynomial regression model. You should see a greatly improved residual plot for the polynomial model. Please recall that we a looking for support of the OLS. The desirable residual plot should display a horizontal band of equal width of points, randomly distributed.

Remember that residual plots can be equally used in model improvement in simple-, multiple linear- and polynomial regression models (and beyond).

Note: Please note, that the introduction of the polynomial term helped only here. In other situations, you may want to consider variable transformation, or other types of regression models (non-linear, logistic,..). Often, you may have to go back to your data and try something else, e.g. data stratification. Always, always continue to be on the look-out for better models, because the model you just developed may not be the best one.

Test of Overall Regression - The Fcalc = 570.4 and Signif F = 0.000 < 0.05 suggest that the overall regression is significant.

Parameter Significance - The tcalc values, and the corresponding P-values suggest that the parameter b0 is not significant (P-value=0.752). However, parameters b1 (P-value=0.000) and b2 (P-value=0.000) are significant. These conclusions are also supported by the confidence intervals. Only the confidence interval for b0 contains zero.

All tests look good, the residual plot supports OLS quite well, and coefficients make sense. The model should be rerun without b0, because b0 is not significant. So far, all indications point toward a polynomial model.