Hypothesis tests are used in studies of data to compare one hypothesis (the *null hypothesis*) to another (the *alternative hypothesis*). Evidence is collected from the data in the form of a *test statisti**c * that may or may not be found to support the null hypothesis. The null hypothesis...

## Unlock

This Answer NowStart your **48-hour free trial** to unlock this answer and thousands more. Enjoy eNotes ad-free and cancel anytime.

Already a member? Log in here.

Hypothesis tests are used in studies of data to compare one hypothesis (the *null hypothesis*) to another (the *alternative hypothesis*). Evidence is collected from the data in the form of a *test statisti**c *that may or may not be found to support the null hypothesis. The null hypothesis is based either on the most up-to-date knowledge about the target population the data came from, or describes the simplest model. The alternative hypothesis is based on new thinking about the target population, or describes deviation from the simplest model to show that there is some particular effect of interest. If the alternative model is in fact true then this can be indicated by the sampled data provided there are enough data to demonstrate a statistically significant departure from the null model. There is a subtlety to notice though, which is that the null hypothesis can be *rejected **in favour of the alternative *at some pre-specified level of significance, but the alternative can never by *accepted *with certainty. The margins of error (type I and II) built into the hypothesis test determine how easy it is to decide between the two hypothesis. The trade-off for an easy decision is a larger potential for then making the wrong decision.

Suppose for example that you wanted to look at the amount of water the average person in a target population drinks in a day. The null hypothesis would suppose say that the average would be equal to that in a comparable population where that amount is known. The alternative would say i) it is not equal to that amount (a two-sided test) ii) it is more than that amount (a one-sided test) iii) it is less than that amount (the counterpart one-sided test to that in ii) ). Say that the population you know the amount for is that of the US, and the target population of interest is that of the UK. In that case, you would probably say you weren't sure whether the amount is more or less, as the countries are quite comparable, being first world countries, so your alternative would be two-sided (a two-tailed test). If you went for a one-sided test and the data ended up indicating the other side, the whole process of the test would be wasted, as post-hoc analysis after seeing results should not be done. However, if the target population of interest is that of an African country, you would probably suspect that the amount would be less and so your alternative would focus on the lower tail of the null distribution - ie, is the observed mean of amount of water drunk in a day far out into the lower tail of the null distribution (obtained from the American population)? Doing a two-sided test would in this case probably be a waste of the *statistical power *of the test, which equals (1 - the type II error).

To trade off how easy it is to discriminate between the two hypothesis against the chance of being correct in conclusions, the experimenter sets the type I and II errors in advance of analysing the data and seeing the patterns in them (otherwise they might be biased by what they see, reacting to what they have observed perhaps purely by chance in the data). The type I error `alpha ` is the probability of rejecting the null hypothesis if it is true - this relates to the *specificity *of the test, which is the probability of not rejecting the null hypothesis given it is true. If a misplaced positive result isn't a particularly bad thing, then the type I error can be set relatively high, say at 10%. On the other hand, if a misplaced positive result is undesirable it would be set low, say at 1%. The type II error `beta ` is the probability of not rejecting the null hypothesis given that it isn't true - this relates to the *sensitivity *of the test, or the *power *of the test, which is the probability of rejecting the null when the alternative is true. If finding true positives is very important, the type II error should be set low, equivalently the power high. Conversely, if finding true positives is not as important as the danger of rejecting the null in error, then the power would be set relatively lower. Unfortunately, it is not possible to set both the type I and II errors to be low, and the two must be traded off against one another. However, typical values tend to be `alpha <= 0.05 ` (5%) and `beta <=0.2 ` (80% power). Making some assumptions about the nature of the data, it is possible to work out in advance what size of dataset would be required to demonstrate a true effect given the pre-specified power of the test. If the assumptions are not founded however, or the sample was unlucky, a significant effect won't be detected. Even if the *sample size calculation *is made beforehand and there is a true effect, finding it is not guaranteed.

**Further Reading**