Misuse of Statistics, The
Without an understanding of the purpose and limitations of statistical tools, even the most well-intentioned person can easily misuse statistics to support a conclusion that is not valid. Both descriptive and inferential statistics are open to misuse if one is not careful. However, an understanding of what various statistical tools can and cannot do, what assumptions need to be met when using them, and how to appropriately interpret the results of statistical tests can enable one to learn what questions to ask when presented with statistical findings, become a better consumer of statistical information, and be less prone to succumb to the allure of misused statistics.
By definition, science requires the application of the scientific method, in which observations of the real world are turned into testable hypotheses, data are collected and analyzed, and conclusions are drawn based on these results. Hypothesis testing and the concomitant use of statistical tools is the way that any science is advanced and theories are validated or changed. However, the presentation of graphs, charts, or numbers derived from arcane formulae alone is not enough to "prove" whether a hypothesis is correct. Unless one understands the limitations of such statistical tools and how to interpret them, it can be easy for even the most well-intentioned person to misuse statistics to support a conclusion that is not valid. At best, statistics give estimates: scientific gambles, as it were, that one's interpretation of observed behavior approximates the actual underlying causes.
Unfortunately, for many people, the use of statistics seems to throw an aura of arcane acceptability over whatever conclusion they are attached to. We are much more likely to believe a conclusion supported by charts, graphs, or numbers than we are to believe the same conclusions if they are unsupported. "Our company has a combined experience of 112 years" sounds so much more venerable than "We have lots of experience," and "80% of students fear taking a statistics course" is more scientific than "Lots of students hate statistics." But the truth is, unless we know where these numbers come from, we do not know what they really mean. The 112 years of experience may actually be the combined ages of the president, vice president, and treasurer of the organization; the 80% of students may refer to a sample drawn from a group of art majors rather than math majors.
Admittedly, the proper use of inferential statistical tools requires training. However, even deceptively simple descriptive statistical techniques can be misused. In most cases, such situations arise due to a lack of understanding of the nature and limitations of the various statistical tools on the part of the person presenting the statistics. In a few cases, however, the person reporting the statistics may actually be trying to mislead the reader. Fortunately, even a little understanding about the nature of statistics can go a long way in helping one be a better informed reader of scientific reports, research studies, and even the daily newspaper. When armed with an understanding of what various statistical tools can and cannot do, what assumptions need to be met when using them, and how to appropriately interpret the results, one can learn what questions to ask when presented with statistical findings, become a better consumer of statistical information, and be less prone to succumb to the allure of misused statistics.
Misuse of Descriptive Statistics
Descriptive statistics can appear to be deceptively simple. Most people learn the basics of calculating a mean and preparing graphs and charts before they reach high school. Newspaper articles, television advertisements, and professional journals all present data summarized by descriptive statistics. However, descriptive statistics cannot be used to draw inferences about or make predictions from a sample of data. The purpose of descriptive statistical techniques is merely to organize and summarize data. Further, one must be careful about how data are displayed using graphical methods so that the data are not misrepresented. One type of misuse of statistics that is commonly seen is shown in the two graphs in Figure 1. Both graphs present the same data. However, the graph on top is designed so that it unfairly magnifies the differences in quarterly income for the four quarters, while the graph on the bottom is drawn to scale, showing that in actuality there is little difference between the quarterly earnings for the four quarters.
Advertisements and articles in news media and other publications frequently are illustrated with graphs and statistics from which conclusions are drawn. However, as illustrated above, these data can be misleading, due either to a poor understanding of descriptive statistics or to an intentional attempt to mislead. Therefore, one needs to take into account the type of descriptive statistics used and understand how the shape of a distribution can distort its meaning.
Descriptive statistics is a subset of mathematical statistics that describes and summarizes data. Included under this umbrella are various methods for summarizing and presenting data so that they are more easily understood, such as graphs, charts, distributions; measures of central tendency that estimate the midpoint of a distribution, such as mean, median, and mode; and measures of variability that summarize how widely dispersed data are in a distribution, such as range, semi-interquartile deviation, and standard deviation. These tools are deceptively simple; in truth, descriptive statistics are misused every day. For example, the three measures of central tendency are all ways to determine the "average" of a distribution of scores. It would be easy to assume that since they are all methods for finding the average, they must be interchangeable. This, however, is not true. Each different approach to determining central tendency has different characteristics from the others and is influenced by different things.
If the underlying distribution were a perfect normal distribution, these three techniques would all yield the same result. However, real-world data are messy, and underlying distributions are virtually never a perfect bell-shaped curve. Yet often only the "average" is reported, with no indication as to whether it is the mean, median, or mode, so that they reader has no idea how the measure may have been affected. For example, in a skewed distribution, where one end has extreme outliers but the data is otherwise normally distributed, the median may be pulled toward the skew (i.e., toward the end of with the outliers). Because of this, when the ends are not balanced and data are clustered toward one end of the distribution, the median may disproportionately reflect the outlying data points. On the other hand, if the extreme ends are balanced (i.e., not skewed), the median is not affected. The mean is also affected by extreme scores, and in a skewed distribution it tends to be pulled even more toward the skew than the median. These tendencies can make significant differences in the resulting values of central tendency. For example, if the mode were used to report the "average" salary for...
(The entire section is 3192 words.)