# How does the type of data collected and the way in which the data are collected affect the possibility of a Type I or Type II error?

mathsworkmusic | Certified Educator

When testing a hypothesis it is very important to be clear about what measure is actually of interest and what target population it applies to.

Often when we collect data we cannot get directly at the quantity we would really like to be able to measure, eg 'does this person have cancer?', 'is this person happy?', 'is this person unhealthy?'. What we find that we must collect, because it is the best we can do, are surrogate data. These indicate symptoms of the causes we are in fact interested in. Because we are measuring secondary rather than primary information, this may cause bias in results regarding the measure of interest. This bias may lead to higher type I and type II errors (ie false positives and false negatives) than specified by the form of the test. In particular, questionnaire data are notorious for this bias as it is difficult to ask subjective questions like 'how do you feel?' and expect to get quantitatively meaningful answers. Even if the questions are objective, people may lie (knowingly or unknowlingly) in their answer. An example of this is asking how often someone drinks or smokes or eats fast foods.

Another form of bias that may affect the type I and II errors is bias in sampling. If we do not take a representative sample from the target population this will introduce bias in the data collected as the data from other populations not of interest may differ in important ways as regards results and so contaminate the sample.

Both these sources of bias relate to relevance of the data collected. Do they get to the bottom of what is being asked? Are the data sampled from the relevant population?

Sometimes it is impossible to remove this bias, but awareness that it is there is useful nonetheless. By measuring the bias on a small sample, this can be used to correct a larger sample. For example differences between anonymously answered questions and questions asked directly to a person could be measured and that information could be used to correct for bias in wider face-to-face surveys.

Surrogate or secondary measurements may lead to bias in data as may contaminated samples. Such bias may inflate the prespecified type I and type II error rates.