What is lifetime in hours of bulb 10? How does this measurement affect the estimation here of the average lifetime of bulbs of this type?
Discuss the reason why, in some cases, the measurement of the lifetime of bulb 10 might be ignored when analysing the lifetime of these bulbs.
The "heures de vie" or "lifetime in hours" of bulb 10 is, from the table 12 hours.
Consider the lifetimes in hours `x_i` `(i = 1,2,...,20)` of the other bulbs 1-20 (excepting bulb 10). The average of those lifetimes is
`sum_(i in (1:20), i !=10) x_i ``/19` `= 35.3` hrs
When bulb 10 is included however, the mean becomes
`sum_(i in (1:20)) x_i``/20` `= 35.3(19/20) + 12/20 = 34.1` hrs.
There are various ways to assess whether an observation is an 'outlier' in the data - ie, does not seem to come from the same population as the other data points. A simple thing to look at is the 'Pearson residual'. One can also look at other more complicated residuals such as the Studentized residual, a jacknife residual, or the Cook's distance measure of influence. All of these more complicated measures involve the 'leverage' `h_i` of the observation which is defined as the ith element of the `X^TX` matrix or 'hat matrix'. You will come across these measures and the 'hat matrix' if you study the method of 'linear regression'.
The 'Pearson residual' doesn't involve the estimate of leverage of the observation (the influence that the observation has on the estimate of the standard error of the data). It is simply
`P_i = (x_i - hat(mu))` `/ hat(V)_i^(1/2)`
where the variance associated with the observation `x_i`, `V_i`, is approximate by
`hat(sigma)^2 = sum_(i in 1:20) (x_i-bar(x))^2``/19`
`= sum_(i in 1:20) (x_i - 34.1)^2` `/19` `= 57.84`
So the Pearson residual for bulb 10 is given by
`P_(10) = (12-34.1)/sqrt(57.84) = -2.90`
The (one-sided) p-value associated with the Pearson residual for bulb 10 (assuming Normality of the data) is `Phi(-2.90) = 0.0019`. This is very small compared to the p-values for the other bulbs which are greater than the smallest which is `Phi(-1.52) = 0.064`.
Given that the lifetime of bulb 10 seems to be unusually low compared to the others one might leave bulb 10 out in calculations estimating the mean and variance of the lifetimes of this type of bulb. However, as this is a relatively small sample this may be a mistake as the distribution of bulb lifetime may have a longer-tailed distribution than the Normal or Gaussian distribution (which we would only be able to discover with much more data). Under the assumption of Normality, the chance of a lifetime so short is less than 2 in 1000, but we cannot know that that assumption is correct. On the other hand, the observation may have been misrecorded. If it was recorded by a human, there is a much bigger chance that this is an error as human error is often non-negligible. In any case, the data point is certainly interesting.
The estimate of the mean lifetime changes quite a lot when bulb 10 is included. The Pearson residual is unusually low (chance lower than 2 in 1000). The distribution of lifetimes, however, might have longer tails than a Normal distribution so that the lifetime of bulb 10 isn't as unusual as it appears. There is also the possibility that the lifetime was misrecorded