What does the standard deviation tell me about my data? 

Expert Answers

An illustration of the letter 'A' in a speech bubbles

The standard deviation of a set of data points is a measure for expressing the spread of the data about their (sample) mean. It is the square root of the variance of the data about their mean. The variance in turn is (very close to) the average squared distance of...

Unlock
This Answer Now

Start your 48-hour free trial to unlock this answer and thousands more. Enjoy eNotes ad-free and cancel anytime.

Start your 48-Hour Free Trial

The standard deviation of a set of data points is a measure for expressing the spread of the data about their (sample) mean. It is the square root of the variance of the data about their mean. The variance in turn is (very close to) the average squared distance of the data from the mean.

Mathematically speaking, if we define our set of data to be `{x_1,x_2,...,x_n}` then

the sample mean of the data is given by

`bar(x) = sum_(i=1)^n x_i/n`

and the sample variance is given by

`s^2 = sum_(i=1)^n (x_i -bar(x))^2/(n-1)`

The sample standard deviation is the square root of the sample variance, ie `s` As a rough rule of thumb, 95% of future data collected should lie between `bar(x) pm 2s` provided `bar(x)` and `s` are good estimates for the true population mean and variance and the distribution of the random variable `X` is approximately Normal or Gaussian.

The reason we don't take the sample variance as the straight average of the squared distances of the data about their mean is that this is a biased estimate of the true population variance. If we took the average squared distance about the true mean ``(usually denoted `mu` ) rather than the sample mean (which is only an estimate of the true mean), this would not be biased. But since we only have the sample mean to work with, we compensate by decreasing the denominator of the estimate for the true variance (usually denoted `sigma^2` ) from n (the number of data points) to n-1 .

Considering squared deviation about the mean of a population gives rise to the Normal or Gaussian distribution which has all sorts of nice mathematical properties and is seen often in natural phenomena such as height and weight of an adult male or female. Another possibility is to consider absolute deviation about the mean, which is a more robust measure of spread as it is not as sensitive to outlying data. Erroneous data points (if they happen to be a long way from the mean) have a big influence on the estimation of the true standard deviation `sigma` when squared deviation is measured, so that outlying data points have less influence on the measure of spread if it is based on absolute rather than squared deviation. Unlike the standard deviation measure, however, the absolute deviation does not give rise to the Normal distribution and does not have nice mathematical properties to work with.

Approved by eNotes Editorial Team