Multiple Linear Regression - Model Testing
The following tests should be carried out:
Parameter Tests
All parameter tests are t-tests of the same form as in simple linear regression. With k+1 parameters there are k+1 t-tests to be conducted, one test for each variable j from 0 to k:
Parameter Confidence Intervals
As an alternative to the above parameter tests you may want to find a confidence interval for each parameter. All parameter confidence interval formulae are of the same form. They can be derived from the T-test statistic.
Note: Here you first simply look at the signs. If the confidence interval left-hand side sign is negative and the right-hand side sign is positive, then zero is included in the interval. Hence, the parameter is not significant, because it may be zero. If, on the other hand, the left-hand side- and right-hand side signs are the same, either both negative or both positive, then zero is not included in the interval, and the parameter is significant (significantly different from zero).
Test of the Significance of the Overall Regression
This is an F-test like in simple linear regression. The test uses the total variability SST in the data. This variability is divided into variability due to regression SSR and variability due to randomness SSE. If all variability in the data is only randomness then SST=SSE. If all variability in the data can be accounted for by the relationship between the dependent variable and the independent variables then SST=SSR. Because some randomness will exist in the data, therefore SST=SSR+SSE. Now two independent estimators of error variance 2 are obtained, one using the SSR and the other using the SSE.
The error variance estimator from SSR becomes
The error variance estimator from SSE becomes
It can be shown that both estimators are unbiased if the data contains only randomness and there is no relationship between the dependent variable y and the independent variables xj, i.e.
and the random variable
follows the F-distribution. However, if there is a relationship between the dependent variable and the independent variables then the estimator s12 is a biased estimator
and the F- random variable does not follow the F-distribution. For significant regression one hopes that s12 would be a biased estimator, and hence, that significant amount of variability in the data would be contained in the SSR.
Note: You may recall that an estimator is said to be unbiased if the expected value of the estimator is the population parameter. That's all what the notation above stands for (E(...)= ...). You don't need to worry about this here. It is just to remind you why and how we came up with the formulae.
Steps for the Overall Significance Test using the ANOVA Approach
Note: This type of an ANOVA summary table is part of regression output of many software capable of statistical analysis including MS Excel.
If Fcalc >f, 1, 2 =f,(k,n-k-1) then conclude that there is a significant amount of regression explained by the model.
Test of the Ordinary Least Squares Assumptions (OLS)
This test is carried out as in simple linear regression by plotting the residuals ei=yi- i against the estimated (or fitted) values I. Each residual provides a point estimate for the error. Again, we are looking for a horizontal band of dots with about constant width, centered around zero for increasing values, and not forming any patterns or clusters.
Note: For practise, please repeat manual computations for a multiple linear regression case with two independent variables, and create the corresponding MS Excel table (with formulae). This will help you eliminate any and all 'magic' from regression.