## Statistical Reasoning

(Research Starters)

Although statistical techniques can help one better interpret the multitude of data in the world around us, in the end, statistics are only tools and must be interpreted by human beings. Opportunities to misinterpret statistics abound. Without understanding the principles behind statistical methods, it is difficult to analyze data or to correctly interpret the results. Similarly, a lack of understanding regarding the ways probability works can result in poor experimental design that yields spurious results. However, understanding the principles underlying statistical methods can enable one to better apply critical thinking and statistical reasoning skills to the analysis and interpretation of data. In the business world, these tools can be used to enable managers and others to make better decisions to optimize the effectiveness and profitability of the organization.

Statistics

### Overview

The gambler's fallacy is a logical fallacy in which someone incorrectly believes that one random event can be predicted from another. For example, a gambler operating under this fallacy might incorrectly assume that because the roulette wheel has landed on red the last six instances, the "law of averages" dictates that it will land on black the next time. However, this supposed law, which assumes that events even out over time, is merely wishful thinking and does not represent the way that probability really works. In fact, the wheel has the same chance of landing on red as it does on black each and every time it is spun.

Logical fallacies and erroneous thinking about the meaning of probability and stochastic processes are not confined to the gaming tables. Without understanding the principles behind statistical methods, it is difficult to analyze data or to correctly interpret the results. Take another classic example: It has been noted that the number of births in the villages of a certain northern European country were highly correlated with the number of storks seen nesting in the chimneys. Tongue-in-cheek, the researchers used this to conclude that storks bring babies. However, they went on to explain that this conclusion would be erroneous because the correlation coefficient r only suggests whether or not two variables are related, not whether one variable is dependent on the other. The truth was probably that the human parents had been enjoying the warm summer months, which meant the deliveries were coincidentally timed with the appearance of the storks happily nesting over the warm chimneys the following spring. The correlation, therefore, was incidental and not causal.

Although statistics can help one better interpret data, trends, and other empirical evidence, they are only tools and must be interpreted by human beings. Statistics merely describe trends and probabilities and must be interpreted in context. This is not necessarily a straightforward process. For example, a student once announced that he had decided to major in a specialty because the average salary was higher than in other professions he had considered. Although his conclusion was valid based on his understanding of the data and interpretation of statistics, he had failed to take a number of other variables into account in his calculations. For example, he did not know on what the "average" was based--that is, whether it was the mean, median, or mode--or what the standard deviation of the distribution was. If the salary distribution for the profession were skewed (i.e., not symmetrical around the mean, so that there are more data points on one side of the mean than on the other, while those on the other side tend to be outliers) by a few people who make an extraordinarily high amount of money, for example, then the realistic average salary could be much lower. In addition, there was no guarantee that he would earn the "average" salary straight after graduation -- or ever. He would have to graduate college and earn two graduate degrees before he was ready to be considered for a professional salary. Even if he were able to vault these hurdles, his actual salary would still depend on his experience, grades, and other qualifications.

Another example of how measures of central tendency can potentially be misinterpreted involves the characteristics of the three different types: mean, median, and mode. Each of these measures has different characteristics and is more useful in certain situations than in others, depending on the characteristics of the underlying data. In a skewed distribution, the median tends to be pulled in the direction of the skew (i.e., toward the end of the distribution with the outliers). Therefore, if the extreme ends are balanced (i.e., not skewed), the median is not affected. However, in situations where these ends are not balanced and data are clustered toward one end of the distribution, the median may disproportionately reflect the outlying data points.

The mean is even more affected by extreme values. Using the example of the person who is considering a career based on average salary, if the average salary reported were the mode and most of the people in that occupation only made $20,000 per year, it would be quite different than if the statistic reported were the mean, which is pulled in the direction of the skew. As shown in Figure 1, the difference between the various measures of central tendency is real, and the measures are not interchangeable. The "average salary" in this example is much closer to the mode than it is to the mean, because of the small proportion of people who make much more than the rest.

Opportunities to misinterpret statistics abound. Every time one opens the newspaper, for example, graphs, statistics, and interpretations leap off the page. An advertisement for a book may state that the authors have a combined experience of 50 years in the field. However, if there are five authors who each have 10 years of experience, it does not necessarily follow that the book is more worthwhile than a book that was written by one person with 40 years' experience. The latter book will more than likely have more insights than a book written by several people, each of whom only have a little experience. However, the quoted statistic does not reflect this difference.

It is not only descriptive statistics that can be misinterpreted; inferential statistics, too, are open to interpretation errors. As mentioned above, one statistic that is frequently misinterpreted is the coefficient of correlation. This inferential statistic is used to determine the degree to which values of one variable are associated with values of another variable. For example, in general, it would be fair to say that weight gain in the first year of life is positively correlated with age; in other words, the older the baby is, the more it is likely to weigh. However, this same correlation would not apply to most adults, as heavier adults are not necessarily older than lighter adults. Correlation only shows the relationship between the two variables; it does not explain why the relationship occurs or what caused it.

Inferential statistics are used for hypothesis testing to make inferences about the qualities or characteristics of a population based on observations of a sample. Statistics are used to test the probability of the null hypothesis (H0) being true. The null hypothesis is the statement that there is no statistical difference between the status quo and the experimental condition. If the null hypothesis is true, then the treatment or characteristic being studied made no difference to the end result. For example, a null hypothesis might state that whether a person is a child or an adult has no bearing on whether he or she...

(The entire section is 3360 words.)