Descriptive statistics comprises a set of statistical tools that help sociologists, researchers, and other analysts better understand the masses of data with which they need to work. These tools include various types of charts and graphs to visually display the data so that they can be more easily understood, measures of central tendency that estimate the midpoint of a distribution, and measures of variability that summarize how widely dispersed the data are over the distribution. Each measure of central tendency and variability has particular strengths and weaknesses and should only be used under certain conditions. Descriptive statistics do not allow one to make inferences about the data or to determine whether or not the data values are statistically significant. Rather, they only describe data.
At its most basic, sociology is the study of humans within society. In order to better understand human behavior from this perspective, sociologists attempt to describe, explain, and predict the behavior of people in social contexts. At first glance, this task seems deceptively simple. After all, we usually know how and why we react the way we do in various situations. It should seem a simple step to extrapolate from our own attitudes and behavior to those of people in general. However, it is not valid to assume that everyone thinks or behaves in the same way. Human beings are infinitely diverse, and often two people can look at the same data or situation and arrive at two very different conclusions.
For example, although all voters have access to the same information during a presidential race, these races can be hotly contested, and voters can fiercely disagree over a candidate's merits. Even within the same party, voters can be divided over a candidate, with some giving credence to one piece of information about the candidate and others valuing another piece. It is a truism that people can look at the same situation and honestly disagree. For this reason, it is impossible to extrapolate from the attitudes or behavior of one individual to society at large. To truly describe, explain, and predict the behavior of people in social contexts, sociologists must acquire data on the attitudes and behaviors of more than one individual.
Just as data collected from only one individual is not of much use to sociologists, neither is data collected from a mere two or three people. Sociologists need to gather data from a large number of people in order to have any confidence that their findings can be extrapolated to people in general. The number of people used in sociological research studies routinely reaches in the hundreds for just this reason. Although hundreds or even thousands of inputs will give us a better picture of how people actually react or behave, this massive amount of data leads to another problem: How can we make sense of all the data and interpret them in a meaningful way? Fortunately, the field of mathematics offers us numerous statistical tools that can aid us in this task.
When thinking of statistics, most people think of inferential statistics, which is a subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences from data, such as drawing conclusions about a population based on a sample. This branch of statistics comprises the seemingly arcane formulae and mathematical computations that so many students dread.
However, there is another class of statistical tools that is used to summarize data and develop inputs for use in inferential statistical computation. Although not a substitute for inferential statistics, descriptive statistics is very useful in helping sociologists better understand the masses of data with which they need to work. In general, descriptive statistics is a subset of mathematical statistics that describes and summaries data. Descriptive statistics are used to summarize and display data through various types of charts and graphs, such as histograms and pie charts. Using these tools, one can easily get a rough idea of the shape of the data; describe the "average" of the data through measures of central tendency, including the mean, median, and mode; and summarize the variability of the data through such measures as the standard deviation, the semi-interquartile deviation, and the range.
One subset of descriptive statistics comprises various graphing techniques that help one organize and summarize data so that they are more easily comprehended. One of the most common and helpful methods for doing this is a frequency distribution. In this technique, data are divided into intervals of typically equal length using techniques such as a stem-and-leaf plot or a box-and-whiskers plot. Graphing data within intervals rather than as individual data points reduces the number of data points on the graph, making the graph--and the underlying data--easier to comprehend.
For example, one might seek to understand people's attitudes about the effects of cell phone use on driving behavior by asking 1,000 people to rate the effects on a scale of 1 to 100, with 1 being the most negative and 100 being the most positive. However, it would be difficult to display these results by graphing all 1,000 points. There would be several clusters of data points where a number of people gave the same response, as well as clusters of data points where people gave similar but not identical responses. Although displaying the data in this way certainly shows the full range of people's responses, it is difficult to interpret the data because of the large number of data points. In addition, one must question whether there is truly a meaningful difference between a rating of 22 on a 100-point scale and a rating of 23. Both of the people responding believed that cell phone usage had a negative effect on driving behavior, but can one really say that the person who responded with a 22 felt that much more negatively about the effects of cell phone usage than the person who responded with a 23? Probably not.
Therefore, it is reasonable to aggregate the data into ranges within the span of scores (e.g., 110, 1120, etc.) before graphing them. As a result, the number of points on the graph is decreased and larger patterns can emerge. Figure 1 shows a comparison between a scatter plot of raw data and a histogram with a superimposed frequency distribution.
Measures of Central Tendency
Although graphing the data using this or other graphing techniques is helpful for better understanding the shape of the underlying distribution, other statistical tools, like measures of central tendency and measures of variability, can be used to understand the data even more thoroughly.
Measures of central tendency estimate the midpoint of a distribution. These measures include
* the median, or the number in the middle of the distribution when the data points are arranged in order;
* the mode, or the number that occurs most often in the distribution; and
* the mean, or the sum of all data values in the distribution divided by the total number of data points in the distribution.
These three methods frequently give different estimates of the midpoint of a distribution because they are all affected differently by the shape of the distribution and by any outlying points.
For example, as shown in Figure 2, for the data set 2, 3, 3, 7, 9, 14, 17, the mode is 3, as there are two 3s in the distribution, but only one of each of the other numbers; the median is 7, since, when the seven numbers in the distribution are arranged numerically, 7 is the number that occurs in the middle; and the mean (or arithmetic mean) is 7.857, since the sum of the seven numbers is 55 and 55 ÷ 7 = 7.857.
(The entire section is 3478 words.)