Possible scores on a homework assignment were between 0 and 43. A sample of the actual scores had a mean of 40 and a standard deviation of 5. What could the underlying distribution be? How many students' marks went into this sample?
Assume that the probability of a student getting a mark is equal to `p `, and is equal across all (parts of) questions and students. Then the distribution of scores X would be X ~ Binomial(n,p), where here `n=43 ` (the total number of available marks). The expected score for a student would be `E[X] = mu = np = 43p ` and the variance of randomly achieved scores is `"Var"[X] =sigma^2= np(1-p) = 43p(1-p) ` . Using the data then we can estimate `p `.
For a Binomial distribution the mean from a sample ` `is estimated as `hat(mu) =nhat(p) `where `hat(p) ` is the observed average percentage right over students in the sample` `, so that `nhat(p) ` is the mean number of questions answered correctly. Here then we estimate `nhat(p) = 40 ` so that `hat(p) = 40/n = 40/43 `.
We are also told that the standard deviation of the sample is approximately 5 so that `mhat(sigma) = 5 ` approximately (assuming we have a class of `m ` students who took the test, since the sampling error of the sum of `m ` values of the variable `X ` is `m"SD"(X) = msigma ` ).
To find `m ` then, set
`hat(sigma) = sqrt(nhat(p)(1-hat(p))) = sqrt(43hat(p)(1-hat(p))) ` so that
`sqrt(43hat(p)(1-hat(p))) = 5/m ` `implies `
`m = 5/sqrt(43hat(p)(1-hat(p))) ` .
Now, since we have from before that `hat(p) = 40/43 ` it follows that
`m = 3 `.
Check: `m=3 ` would imply `3hat(sigma) = 5 ` (from `mhat(sigma) = 5 ` ) , `implies ` `hat(sigma) = 5/3 ` (approximately)
We also have that `hat(sigma) = sqrt(43hat(p)(1-hat(p))) = 1.67` , which is indeed approximately equal to 5/3.
The distribution of scores can be modelled as Binomial(n,p) where n = 43. From the given sample we can estimate that p = 40/43. From the given standard deviation of the sample we can discern that the sample was of size m = 3. If there were more students in the sample this would imply that the sample is underdispersed (as the standard deviation isn't big enough, according to the Binomial model). Underdispersion would imply a lack of randomness in the sample, meaning that the questions aren't varied enough, or that the students responses to the questions are standard, or that some of the students are copying each others' work!