**
Polynomial Regression - Examples
**

The purpose of this example is to demonstrate that linear
regression will not work even in the simplest of cases. We will
use the *residual plot* of the simple linear regression to
help us expand the model into a polynomial model.

This example covers two cases of polynomial regression.
The first data set forms an *exact* parabolic curve,
and then the second data set forms an *approximate*
parabolic curve. We will no longer look at manual computations.
However, if you want to carry out the parameter estimation
manually, or use e.g. *Microsoft Excel* spreadsheet to help
in the computations, the required formulae are given in the section
Polynomial Regression Computational Steps.

The cases are developed first using the *Microsoft
Excel Trendline Function*, and then running the *Microsoft
Excel Regression Analysis Tool* on the data. The regression
analysis is performed only on the second data set. The objective
is to demonstrate

- how perfect polynomial regression can occur,
- linear model shows no regression, but...
- how a residual plot can be used to improve a regression model,

**Note:** Please keep in mind that all statements made here
with respect to polynomial regression are also valid with other
regression models.

**Let's look at the two cases. **

**Case 1: Perfect Polynomial Regression **

For this case the data values were generated using a parabolic
function *y = -0.01x ^{2} + 0.5x + 0.05*. Just for
the fun a simple linear regression model was first developed.
Clearly, the model does not fit. I know, we didn't expect it to
fit. How could we fit a line to a curve successfully any way!!!
Next, we want to test if the

**Case 2: Errors Introduced - Polynomial Regression**

In this case the data pattern contains errors, but follows somewhat closely the parabolic pattern of the previous example.

First, a *simple linear regression model* is developed.
As you can see from the animation, simple linear
regression suggests that there is no regression.
The slope coefficient of the line is zero and *R ^{2}=0*.
With these kinds of results in practical situations, you may
conclude that there is no regression.

If you, however, study the
scatter plots of variable pairs, you may identify patterns. Such
patterns may call for variable transformations or other types of
models. Because the above scatter plot shows a strong parabolic
pattern, we wanted to attempt to fit a second degree polynomial
to the data using the *Trendline* function of *Microsoft
Excel*. Visual analysis of the fitted polynomial
confirms that the parabola appears to fit very well. How
about multivariate cases, what can we do!?! The answer is, that
most commonly we use the residual plots.

**Note:** In multivariate
cases we visually analyze the residual plots, and make model
improvements with a goal that an ideal residual plot should
display a *random horizontal pattern of points of equal width*.
Please recall that *OLS* calls for *normally distributed
errors with mean zero* (dots are about equally above and below
zero) and *constant variance* (dots form a horizontal band
of equal width), and that the *errors are independent*
(dots are randomly distributed and do not form distinct patterns).

The above animation of the residual plots shows the two
situations for the second data set: the residual plot for the
simple linear regression model, and the residual plot for the
polynomial regression model. You should see a greatly improved
residual plot for the polynomial model. Please recall that we
a looking for support of the *OLS*. The desirable residual
plot should display a horizontal band of equal width of points,
randomly distributed.

Remember that residual plots can be equally used in model improvement in simple-, multiple linear- and polynomial regression models (and beyond).

**Note: ** Please note, that the introduction of the polynomial
term helped only here. In other situations, you may want to consider
variable transformation, or other types of regression models (non-linear,
logistic,..). Often, you may have to go back to your data and try
something else, e.g. data stratification. Always, always continue to
be on the look-out for better models, because the model you just
developed may not be the best one.

**Test of Overall Regression** -
The *F _{calc} = 570.4* and

**Parameter Significance** - The *t _{calc}* values,
and the corresponding

All tests look good, the residual plot supports *OLS* quite
well, and coefficients make sense. The model should be rerun without
*b _{0}*, because