Discuss the distinction between the standard deviation and the standard error of an observable random variable `X `. Include an explanation as to how, for collected samples of size n, the Z formula for individual observed values of `X ` differs from the Z value for observed values of the sample mean `bar(X) ` ` `. Also discuss how you would calculate the value of the sample mean `bar(X) ` beyond which the top 10% of all possible sample means lie.
The standard deviation is a parameter of the population distribution. To be specific, the standard deviation is the square-root of the variance parameter of that distribution. If the measure or variable of interest `X` follows a Normal`(mu,sigma^2)` distribution, then the variance parameter is defined as `sigma^2` and the standard deviation parameter as `sigma` .The other parameter of the Normal distribution is the mean parameter, `mu` . The variance (parameter) `sigma^2 = V[X]` of a random variable is the second central moment about the mean (parameter) `mu = E[x]` , that is
`V[X] = E[(X-E[X])^2]`
To transform `X` to a standard Normal deviate `Z` one needs to subtract the mean (parameter) and divide by the standard deviation (parameter), so that
`(X-mu)/sigma quad ~ quad N(0,1)`
The standard error however is the square-root of the variance of a sample, where the sample consists of `n` independently and identically distributed (iid) observed values of `X` : `x_1,x_2,...,x_n`
The variance of such a sample is given by
`hat(sigma)^2 = sum_(i=1)^n (x_i-bar(x))^2/(n-1)`
where `bar(x) = sum_(i=1)^n x_i/n` is the sample mean. Therefore, the sample standard deviation, or standard error is given by `sqrt(hat(sigma)^2)` `= hat(sigma)` .
To transform the sample values of `X` ``to approximately standard Normal deviates `Z` one needs to this time subtract the sample mean and divide by the standard error giving``
`z_i = (x_i-bar(x))/hat(sigma)` `i = 1,2,...,n`
The true distribution of the sample mean over many experiments (conditional on the distribution of X being N(`mu,sigma^2)` as above), taking samples of size n over and over again is given by
`bar(X) ~ N(mu,sigma^2/n)`
This is called the sampling distribution of the sample mean.
The sample mean variable can be standardised to a N(0,1) `Z` statistic by subtracting the mean and variance of the distribution giving
`(bar(X)-mu)/(sigma/n) ~ N(0,1)`
The top 10% of possible observed sample means corresponds to the upper 10th percentile of the standard Normal distribution,```` `Phi^(-1)(0.9)` ``, where `Phi` is the cdf of the standard Normal distribution. From statistical tables for the standard Normal we find that
`Phi^(-1)(0.9) = 1.282`
so that we require `(bar(x)-mu)/(sigma/n) > 1.282```in order for a particularobserved `bar(x)` to `bar(X)`be in the upper 10% of the distribution of possible sample means - the sampling distribution of the sample mean. Rearranging this expression gives that we require that
`bar(x) > mu + 1.282(sigma/n)`
in order for an observed `bar(x)`to be in the upper 10% of the sampling distribution of the sample mean.
In other words, the observed sample mean would need to be 1.282 (true population) standard deviations above the population mean in order to be in the top 10% of the sampling distribution.