Use the midpoint of each interval to find an estimate for the standard deviations of the weights?
The table below represents weights, w, in grams, of 80 packets of roasted peanuts.
To answer this question, we need to treat it like we only have 7 possible outcomes, which are the midpoints of each interval seen here. The first two have the calculation performed, but the last few are just given for time's sake:
Interval 1 = `(80+85)/2 = 82.5`
Interval 2 = `(85+90)/2 = 87.5`
Interval 3 = 92.5
Interval 4 = 97.5
Interval 5 = 102.5
Interval 6 = 107.5
Interval 7 = 112.5
Now, we simply treat the number of packets in each interval as the number of samples with the above values. For example, we see there are 5 packets in the first interval. For this analysis, then, we will now say there are 5 packets of value 82.5.
To calculate the standard deviation, we must first find the mean (`mu`) which is found by summing each packets' value and dividing by the number of packets (`N = 80`). To make this simpler, we can just multiply the value for each interval (the midpoint) by the frequency (number of packets for each interval):
`mu = (5*82.5 + 10*87.5+15*92.5+26*97.5+13*102.5+7*107.5+4*112.5)/(5+10+15+26+13+7+4)`
Simplifying, we get:
Now, we will use the formula for sample standard deviation:
`sigma_x = sqrt((sum_(n=1)^N (x_n-mu)^2)/(N-1))`
Here, all we are doing is subtracting each packet's value from the mean and squaring this value for each packet. We then add all of those values and subtract by N-1. This calculation, in a way, gives us the average amount by which a value might be away from the mean. The calculation is a bit long for the whole set, but I'll demonstrate the first two to illustrate:
`sigma_x = sqrt(((82.5-96.8125)^2 + (82.5-96.8125)^2 +...)/79)`
`sigma_x = sqrt((204.85 + 204.85+...)/79)`
This value comes out to about 7.45.
It's a fairly long calculation, but this answer ends up being correct for the sample standard deviation. If you use a spreadsheet, you may end up with a slightly different answer depending on the software used. This is due to the population standard deviation having a slightly different formula where the divisor is `N` instead of `N-1`. We use `N-1` because it is the sample standard deviation that will be most appropriate for the given data set.