In order to better describe, explain, and predict the behavior of groups of people in society, sociologists make observations, develop hypotheses, and collect data with the intent of drawing conclusions from their findings. Inferential statistics is a family of tools that are used to support these efforts by allowing sociologists to draw conclusions from data and test whether or not the results of a study are due to chance or to some underlying phenomenon. A wide range of statistical methods are available for testing hypotheses. Each of these methods is appropriate to a different type of experimental design. Some of these tools include t-tests, the z statistic, analysis of variance (ANOVA), and regression analysis.
Using Statistics to Analyze Data
The overarching goal of sociological research is to describe, explain, and predict the behavior of people within society. To this end, sociologists observe behavior, develop hypotheses, collect data, and draw conclusions from their findings. Although it would be possible to perform these activities based on the input from a few people, human beings are infinitely diverse. Consequently, in most situations, it is virtually impossible to predict the behavior of a large group of people based on the actions of just one individual.
For example, despite our attempts at prognostication, it can be difficult to predict the outcome of a political election. Different people can look at the same data concerning opposing candidates and draw vastly different conclusions about the candidates' likelihood of being elected. Even the fact that a certain percentage of eligible voters cast their ballots for a given candidate in a primary election does not necessarily mean that they will do so again in the general election. An independent, for example, may vote in the primary of one party in order to help ensure that the candidate he or she prefers from that party is nominated in case the candidate of his or her choice from an opposing party is not elected. Another voter might try to ensure that an opposing party's least electable candidate is nominated so that the candidate from his or her preferred party has a better chance of winning the general election. Therefore, it is difficult to extrapolate from the fact that a candidate won a primary election that he or she will win the general election.
To be better able to meet the sociological goal of predicting behavior within society, it is important that data be collected from a wide range of individuals. In this way, patterns greater than the opinions or actions of a given individual or small group can emerge, and the sociologist can draw conclusions about the actions of a population that are based on a representative sample of the population.
One way to do this is by collecting data from a wide variety of people and determining what the average response to a situation is. The use of descriptive statistics, which measure the central tendency or "average" (mean, median, and mode) of a sample, may give us a better picture of the inclinations of the population. However, this is still a very restricted picture.
For example, if we give registered voters a questionnaire asking them to rate on a scale of 1 to 10 how much they like a certain candidate, the average answer might be 4.5. Based on this piece of information, we might conclude that the candidate is neither well liked nor strongly disliked. However, how the raw data falls on the scale is very important. If the raw data were clustered around the middle of the scale, this conclusion would probably be correct; if the raw data were evenly distributed across the scale, this conclusion would be less warranted, and we would need to conduct further investigations to determine how much the candidate is really liked. Similarly, if the data were polarized, with approximately half the people polled disliking the candidate extremely and the other half liking the candidate extremely, we would still have the same 4.5 "average" score, despite the fact that no one was ambivalent about the candidate.
Such problems with interpretation are not the only drawback to solely using descriptive statistics to draw conclusions from a sample. On the same 10 point scale, can we say with confidence that there is truly a difference between a score of seven and a score of eight? To overcome these and other limitations of descriptive statistics, sociologists and other scientists turn to inferential statistics in order to draw conclusions, or inferences, from their data.
What Is Inferential Statistics?
Inferential statistics is a subset of mathematical statistics that is used in the analysis and interpretation of data. In the examples above, inferential statistics could better help us understand the results of the primary data or meaningfully interpret the results of the polling data. Inferential statistics is used to test hypotheses to determine if the results of a study have statistical significance, meaning that they occur at a rate that is unlikely to be due to chance.
A hypothesis is an empirically verifiable declarative statement concerning the relationship between independent and dependent variables and their corresponding measures. An example of a hypothesis might be the assertion that people offer friendship more readily to those they feel are similar to themselves than they do to others. In this example, the independent variable, or the variable that is manipulated by the researcher, is the degree of similarity between the subject and the people around him or her. The dependent variable, or the subject's response to the independent variable, would be to whom the subject offers friendship.
Hypotheses are stated in two ways. A null hypothesis (H0) is a statement that denies that there is a statistical difference between the status quo and the experimental condition. In other words, it states that the independent variable being studied makes no difference to the end result. For example, a null hypothesis about people's preference for befriending others with whom they believe they have something in common might be, "There is no difference in the number of overtures of friendship made to strangers based on whether or not the strangers have something in common with the subject." This null hypothesis states that there is no relationship between the independent variable of perceiving that one has something in common with another person and the dependent variable of whether or not the person being studied makes an overture of friendship to that other person. The alternative hypothesis (H1) would be that there is a relationship between the two variables--for example, "People offer friendship more readily to those they feel are similar to themselves than they do to others."
Once the null hypothesis has been formulated, an experimental design is developed that allows the researcher to empirically test the hypothesis. Typically, the experimental design includes a control group that that does not receive the experimental conditions and an experimental group that does. In this case, the individuals in the control group would not be exposed to people who are markedly different from them, while the individuals in the experimental group would be. The researcher then collects data from people in the study to determine whether or not the experimental condition had any effect on the outcome. After the data have been collected, they are statistically analyzed to determine whether the null hypothesis should be accepted or rejected. Accepting the null hypothesis means that if the data in the population are normally distributed, the results are more than likely due to chance. This is illustrated in Figure 1 as the unshaded portion of the...
(The entire section is 3425 words.)