a) Suppose the null hypothesis H0 : β1 = ··· = βk = 0 were false. Would the sum of squares due to regression SSR tend to be larger, and if so, why?
b) If H0 :β1 =···=βk =0 were true
i) What is the distribution of the data Yi?
ii) What is the distribution of SST/`sigma^2` where SST is the total sum of squares and `sigma^2` is the true variance of the variable Y?
a) If the null hypothesis
`H_0: beta_1= ... = beta_k = 0` is false, then there is a true effect on the dependent variable `Y` of at least one of the independent variables (covariates) `X_j` , `j = 1,...,k` .
Therefore, the regression model `Y = beta^TX` (where at least one `beta_j` is non-zero) fits the data better than the null model `Y = mu` where `mu` is simply the true mean of the data `Y`.
A well-known result is that the total sum of squares SST can be partitioned into the sum of squares due to regression SSR and the residual sum of squares, or sum of squares due to error SSE, ie
SST = SSR + SSE
If `R^2 = (SSR)/(SST)` is closer to 1 then we are more likely to reject the null that there is no effect due to the covariates `X`, or more formally,
the F-test of significance of the effect of the covariates,
`F = (SSR"/"(k-1))/(SSE"/"(n-k))` leads to a rejection of the null hypothesis ` `H0 if `F` is in the upper 100`alpha`% tail of the F(k-1,n-k) distribution.
So, if SSR is a relatively larger component of SST than SSE, the null is rejected. Therefore, if the null hypothesis isn't true, SSR is likely to be larger. The more covariates X that are significant and the larger the effect they have on the outcome Y, the easier it will be to detect the effects without collecting a large amount of data Y, since SSR will be a significantly large component of SST.
b) If the null hypothesis H0 is true, then
i) the distribution of the data `Y_i` , `i = 1,...,n` is Normal(`mu, sigma^2`` `), ie the model of a horizontal line through the data points (a straight mean) should be found to fit (not be rejected in favour of an alternative model).
ii) by Cochran's Theorem the total sum of squares divided by the true variance of the data `sigma^2` will have a chi-squared distribution with `n-1` degrees of freedom, that is
`(SST)/(sigma^2)` ~ `chi^2_(n-1)`
which can be alternatively written as `(nhat(sigma^2))/sigma^2` ~ `chi^2_(n-1)`
since SST/n ` `is an estimator for the variance of the data `sigma^2` .
a) If the null hypothesis were false then we would expect SSR to be larger (a larger component of the total sum of squares SST).
b) If the null hypothesis were true then
i) `Y_i` ~ N(`mu,sigma^2`) , `i = 1,...,n)`
ii) `(SST)/sigma^2` ~ `chi^2_(n-1)`