## Theoretical Statistics

(Research Starters)

Statistics allow one to organize and interpret data that would otherwise be incomprehensible. However, statistics is much more than a set of mathematical techniques that are used to manipulate data in order to derive an answer. For statistics to be truly useful, one must recognize and understand the fact that there is an underlying uncertainty and variability in data and collections of data. Analyzing and interpreting data using statistics is a messy process and sampling error, measurement error, and estimation error can negatively impact the results. In addition, not every statistical technique is appropriate for use in every situation. The researcher needs to be careful to pick the correct technique to match the characteristics of the data being analyzed. Statistics do not yield exact results, but only probabilities. Although not an exact science -- or at least not a science of exact results -- if one understands the theoretical underpinnings, statistics can be of immeasurable help in understanding the phenomena of the real world.

Taking a statistics course can be an intimidating experience for many people. Perhaps the reason is that they fear the precision required to accurately calculate answers or perhaps it is because a long list of arithmetic procedures looks like an overwhelming amount of effort to obtain a single numerical answer. Perhaps, however, the real problem lies with the front and back ends of the statistical process: Determining which analytical technique to use and knowing how to properly interpret the end result of the calculations. The options can be confusing. How does one properly design an experiment to adequately test a hypothesis so that all the important variables are considered? What statistical analysis technique is appropriate to evaluating the data? Is the test one-tailed or two-tailed? What is the confidence level for the results? These and other questions can plague beginning students and professionals alike in their search for making sense out of data and drawing real world conclusions. The trick to understanding statistics is to understand the theory and principles underlying them. Statistics is much more than a set of mathematical techniques that are used to manipulate data in order to derive an answer. For statistics to be truly useful, one must recognize and understand the fact that there is an underlying uncertainty and variability in data and collections of data.

As human beings, most of us try to move from a position of uncertainty to one of certainty. Knowing "truth" is comforting, and can help us make decisions and plan for the future. However, life does not work that way and statistics does not work that way, either. Rather, statistics suggest with various degrees of confidence (or lack thereof) that one interpretation of the results is more likely than the other. Statistics do not yield black-and-white answers: They give best guesses or scientific estimates.

### Statistical Error

### Sampling Error

In reality, analyzing and interpreting data using statistics is a messy process. Error is a fact of life when dealing with real world problems. People and things do not always act the way that we expect them to. This happens for a number of reasons. First, for most situations it is virtually impossible to gather data on every member of a population -- the entire group of subjects belonging to a certain category. For example, if one wanted to know what features people in the United States would like in a new widget, it would be virtually impossible to ask each individual: There are simply too many people for this to be a reasonable and cost-effective task. In addition, some people may be out of the country or otherwise unavailable for comment. Therefore, data are usually collected on a sample -- a subset of the population that is assumed to be representative of the population. Sometimes a random sample is used that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population. The problem with the assumption, however, is that it is impossible to tell whether or not the sample is truly representative without looking at the characteristics of the population. As a result, sampling error -- an error that occurs in statistical analysis when the sample does not represent the population -- can occur and throw off the results of the research. Further, some people may just lie to the researcher for any of a number of reasons ranging from not understanding the question to not paying attention to deliberately trying to throw off the results, again compounding the possibility of error.

### Measurement Error

Another type of error that can occur when using statistics to analyze data is measurement error. This is the portion of the observed score that is random noise and not part of the measurement. For example, the researcher may sometimes read the level in a beaker slightly high at times and slightly low at other times. In most cases, measurement errors tend to cancel each other out with repeated measurements (i.e., an accidentally inflated value on one measurement is compensated for by an accidentally deflated value on another measurement). However, there are systematic measurement errors that can be introduced into the situation that do not cancel each other out. For example, the way a question is asked on a questionnaire or in an interview may be misunderstood and an accurate response may not be obtained. Similarly, sometimes researchers unconsciously bias the results by their expectations or by the way that they collect the data. For example, if Harvey likes to flirt with women, he may end up with much better data collected from the women that he interviews than from the cursory discussions he has with the men that he samples.

### Estimation Error

Estimation error is the error introduced by statistical estimates. This type of error can come from several sources. Although it is assumed that there is a real or "true" value underlying the numbers that are used in statistical calculations, one almost never gets to work with these true values. For example, most people remember the value of p as 3.14, 3.141, 3.1416, or some other finite value. However, p is actually an infinite decimal that cannot be computed exactly either by finite human beings or their finite computers. How one rounds p or any long decimal number is not of particular importance to the outcome of the calculations in which it is used in simple situations. However, in other instances, it can be and can throw off the entire calculation with the rounding error magnified by subsequent computations. Each number is merely an...

(The entire section is 2923 words.)