# sampling

## sampling

### Introduction (Psychology and Mental Health)

A critical part of social research is the decision as to what will be observed and what will not. It is often impractical or even impossible to survey or observe every element of interest. Sampling methodology provides guidelines for choosing from a population some smaller group that represents the population’s important characteristics. There are two general approaches to selecting samples: probability and nonprobability sampling.

Probability sampling techniques allow researchers to select relatively few elements and generalize from these sample elements to the much larger population. For example, in the 1984 U.S. presidential election, George Gallup’s final preelection poll correctly predicted that the popular vote would split 59 percent to 41 percent in favor of Ronald Reagan. This accurate prediction was based on the stated voting intentions of a tiny fraction—less than 0.01 percent—of the 92.5 million people who voted in the election. Accuracy was possible because Gallup used probability sampling techniques to choose a sample that was representative of the general population. A sample is representative of the population from which it is chosen if the aggregate characteristics of the sample closely approximate those same aggregate characteristics in the population. Samples, however, need not be representative in all respects; representativeness is limited to those characteristics that are relevant to the substantive...

(The entire section is 526 words.)

### Formulating the Sample (Psychology and Mental Health)

When sampling is necessary, it is essential that the researcher first consider the quality of the sampling frame. A sampling frame is the list or quasi list of elements from which a probability sample is selected. Often, sampling frames do not truly include all of the elements that their names might imply. For example, telephone directories are often taken to be a listing of a city’s population. There are several defects in this reasoning, but the major one involves a social-class bias. Poor people are less likely to have telephones; therefore, a telephone directory sample is likely to have a middle- and upper-class bias. To generalize to the population composing the sampling frame, it is necessary for all of the elements to have equal representation in the frame. Elements that occur more than once will have a greater probability of selection, and the overall sample will overrepresent those elements.

Regardless of how carefully the researcher chooses a sampling frame and a representative sample from it, sample values are only approximations of population parameters. Probability theory enables the researcher to estimate how far the sample statistic is likely to diverge from population values, using two key indices called confidence levels and confidence intervals. Both of these are calculated by mathematical procedures that can be found in any basic statistics book.

A confidence level specifies how confident...

(The entire section is 374 words.)

### Sampling Techniques (Psychology and Mental Health)

A basic principle of probability sampling is that a sample will be representative of the population from which it is selected if all members of the population have an equal chance of being selected in the sample. Flipping a coin is the most frequently cited example: The “selection” of a head or a tail is independent of previous selections of heads or tails. Instead of flipping a coin, however, researchers usually use a table of random numbers.

A simple random sample may be generated by assigning consecutive numbers to the elements in a sampling frame, generating a list of random numbers equal to one’s desired sample size, and selecting from the sampling frame all elements having assigned numbers that correspond to one’s list of random numbers. This is the basic sampling method assumed in survey statistical computations, but it is seldom used in practice because it is often cumbersome and inefficient. For that reason, researchers usually prefer systematic sampling with a random start. This approach, under appropriate circumstances, can generate equally representative samples with relative ease.

A systematic sample with a random start is generated by selecting every element of a certain number (for example, every fifth element) listed in a sampling frame. Thus, a systematic sample of one hundred can be derived from a sampling frame containing one thousand elements by selecting every tenth element in the...

(The entire section is 313 words.)

### Sampling Frame Periodicity (Psychology and Mental Health)

Earl Babbie has described a study of soldiers that illustrates how sampling frame periodicity can produce seriously unrepresentative systematic samples. He reports that the researchers used unit rosters as sampling frames and selected every tenth soldier for the study. The rosters, however, were arranged by squads containing ten members each, and squad members were listed by rank, with sergeants first, followed by corporals and privates. Because this cyclical arrangement coincided with the ten-element sampling interval, the resulting sample contained only sergeants.

Sampling frame periodicity, although a serious threat to sampling validity, can be avoided if researchers carefully study the sampling frame for evidence of periodicity. Periodicity can be corrected by randomizing the entire list before sampling from it or by drawing a simple random sample from within each cyclical portion of the frame.

The third method of probability sampling, stratified sampling, is not an alternative to systematic sampling or simple random sampling; rather, it represents a modified framework within which the two methods are used. Instead of sampling from a total population as simple and systematic methods do, stratified sampling organizes a population into homogeneous subsets and selects elements from each subset, using either systematic or simple random procedures. To generate a stratified sample, the researcher begins by...

(The entire section is 302 words.)

### Comprehensive Sampling (Psychology and Mental Health)

Simple random sampling, systematic sampling, and stratified sampling are reasonably simple procedures for sampling from lists of elements. If one wishes to sample from a very large population, however, such as all university students in the United States, a comprehensive sampling frame may not be available. In this case, a modified sampling method, called multistage cluster sampling, is appropriate. It begins with the systematic or simple random selection of subgroups or clusters within a population, followed by a systematic or simple random selection of elements within each selected cluster. For example, if a researcher were interested in the population of all university students in the United States, it would be possible to create a list of all the universities, then sample them using either stratified or systematic sampling procedures. Next, the researcher could obtain lists of students from each of the sample universities; each of those lists would then be sampled to provide the final list of university students for study.

Multistage cluster sampling is an efficient method of sampling a very large population, but the price of that efficiency is a less accurate sample. Although a simple random sample drawn from a population list is subject to a single sampling error, a two-stage cluster sample is subject to two sampling errors. The best way to avoid this problem is to maximize the number of clusters selected while...

(The entire section is 236 words.)

### Statistical Theory (Psychology and Mental Health)

As Raymond Jessen points out, the theory of sampling is probably one of the oldest branches of statistical theory. It has only been since the early twentieth century, however, that there has been much progress in applying that theory to, and developing a new theory for, statistical surveys. One of the earliest applications for sampling was in political polling, perhaps because this area provides researchers with the opportunity to discover the accuracy of their estimates fairly quickly. This area has also been useful in detecting errors in sampling methods. For example, in 1936, the Literary Digest, which had been accurate in predicting the winners of the U.S. presidential elections since 1920, inaccurately predicted that Republican contender Alfred Landon would win 57 percent of the vote over incumbent President Franklin D. Roosevelt’s 43 percent. The Literary Digest’s mistake was an unrepresentative sampling frame consisting of telephone directories and automobile registration lists. This frame resulted in a disproportionately wealthy sample, excluding poor people who predominantly favored Roosevelt’s New Deal recovery programs. This emphasized to researchers that a representative sampling frame was crucial if the sample were to be valid.

In the 1940’s, the U.S. Bureau of the Census developed unequal probability sampling theory, and area-probability sampling methods became widely used and...

(The entire section is 468 words.)

### Sources for Further Study (Psychology and Mental Health)

Babbie, Earl R. The Practice of Social Research. 11th ed. Belmont, Calif.: Wadsworth, 2007. Written in clear, easy-to-understand language with many illustrations. Babbie discusses both the logic and the skills necessary to understand sampling and randomization. Contains appendixes, a bibliography, an index, and an excellent glossary. One of the appendixes contains a table of random numbers.

Blalock, Hubert M., Jr. Social Statistics. 2d ed. New York: McGraw-Hill, 1981. Provides an extensive section on sampling that pays particular attention to random sampling, systematic sampling, stratified sampling, and cluster sampling. Although there are some formulas and computations, the majority of the discussion is not technical, and the explanations are clear.

Henry, Gary T. Practical Sampling. Newbury Park, Calif.: Sage Publications, 1998. Provides detailed examples of selecting alternatives in actual sampling practice. Not heavily theoretical or mathematical, although the material is based on the theoretical and mathematical sampling work that has preceded it. Provides references for those interested in proceeding deeper into the literature.

Jessen, Raymond James. Statistical Survey Techniques. New York: John Wiley & Sons, 1978. Provides a clear introduction to statistical sampling. The examples are understandable and relevant, and they illustrate the points made on...

(The entire section is 287 words.)

## Sampling (Encyclopedia of Public Health)

In many disciplines, there is often a need to describe the characteristics of some large entity, such as the air quality in a region, the prevalence of smoking in the general population, or the output from a production line of a pharmaceutical company. Due to practical considerations, it is impossible to assay the entire atmosphere, interview every person in the nation, or test every pill. Sampling is the process whereby information is obtained from selected parts of an entity, with the aim of making general statements that apply to the entity as a whole, or an identifiable part of it. Opinion pollsters use sampling to gauge political allegiances or preferences for brands of commercial products, whereas water quality engineers employed by public health departments will take samples of water to make sure it is fit to drink. The process of drawing conclusions about the larger entity based on the information contained in a sample is known as statistical inference.

There are several advantages to using sampling rather than conducting measurements on an entire population. An important advantage is the considerable savings in time and money that can result from collecting information from a much smaller population. When sampling individuals, the reduced number of subjects that need to be contacted may allow more resources to be devoted to finding and persuading nonresponders to participate. The information collected using sampling is often more accurate, as greater effort can be expended on the training of interviewers, more sophisticated and expensive measurement devices can be used, repeated measurements can be taken, and more detailed questions can be posed.

DEFINITIONS

The term "target population" is commonly used to refer to the group of people or entities (the "universe") to which the findings of the sample are to be generalized. The "sampling unit" is the basic unit (e.g., person, household, pill) around which a sampling procedure is planned. For instance if one wanted to apply sampling methods to estimate the prevalence of diabetes in a population, the sampling unit would be persons, whereas households would be the sampling unit for a study to determine the number of households where one or more persons were smokers. The "sampling frame" is any list of all the sampling units in the target population. Although a complete list of all individuals in a population is rarely available, an alphabetic listing of residents in a community or of registered voters are examples of sampling frames.

SAMPLING METHODS

The general goal of all sampling methods is to obtain a sample that is representative of the target population. In other words, apart from random error, the information derived from the sample is expected to be the same had a complete census of the target population been carried out. The procedures used to select a sample require some prior knowledge of the target population, which allows a determination of the size of the sample needed to achieve a reasonable estimate (with accepted precision and accuracy) of the characteristics of the population. Most sampling methods attempt to select units such that each has a definable probability of being chosen. Methods that adopt this approach are called "probability sampling methods." Examples of such methods include simple random sampling, systematic sampling, stratified sampling, and cluster sampling.

A random sample is one where every person (or unit) in the population from which the sample is drawn has some chance of being included in it. Ideally, the selections that make up the sample are made independently; that is, the choice to select one unit will not affect the chance of another unit being selected. The simplest way of selecting sampling units where each unit has an equal probability of being chosen is referred to as a simple random sample.

Systematic random sampling involves deciding what fraction of the target population is to be sampled, and then compiling an ordered list of the target population. The ordering may be based on the date a patient entered a clinic, the last surname of patients, or other factors. Then, starting at the beginning of the list, the initial sample unit is randomly selected from within the first k units, and thereafter every kth individual is sampled. Typically, the integer k is estimated by dividing the size of the target population by the desired sample size. This method of sampling is easy to implement in practice, and the sampling frame can be compiled as the study progresses.

A stratified random sample divides the population into distinct nonoverlapping subgroups (strata) according to some important characteristics (e.g., age, income) and then a random sample is selected within each subgroup. The investigator can use this method to ensure that each subgroup of interest is represented in the sample. This method generally produces more precise estimates of the characteristics of the target population, unless very small numbers of units are selected within individual strata.

Cluster sampling may be used if the study units form natural groups or if an adequate list of the entire population is difficult to compile. In a national survey, for example, clusters may comprise individuals in a localized geographic area. The clusters or regions are selected, preferably at random, and the persons are enumerated in each selected region and random samples are drawn from these units of the population. Because sampling is performed at multiple levels, this method is sometimes referred to as multistage sampling.

With nonprobability sampling methods, the probability of being included in the sample is unknown. Examples of this sampling method include convenience samples and volunteers. These types of samples are prone to bias and cannot be assumed to be representative of the target population. For example, people who volunteer are frequently different in many respects from those who do not. Tests of hypothesis and statistical inference concerning the sampled units and the target population can only be applied with probability sampling methods. That is, there is no way to assess the validity of the samples obtained using nonprobability sampling strategies.

VALIDITY AND SOURCES OF ERROR

The distribution of values in any sample, no matter how it is selected, will differ from the distribution in sample chosen by chance alone. The larger the sample, the more likely it is that the sample reflects the characteristic of interest in the target population. However, there are sources of error not related to sampling that may bias comparisons between the sampled units and the target population. First, coverage error (selection bias) may arise when the sampling frame does not fully cover the target population. Second, nonresponse bias may occur when sampled individuals cannot be reached or will not provide the information requested. Bias is present if respondents differ systematically from the individuals who do not respond. Finally, the measuring device may not be able to accurately determine the characteristics being measured.

PAUL J. VILLENEUVE

(SEE ALSO: Statistics for Public Health; Stratification of Data; Survey Research Methods)

BIBLIOGRAPHY

Kelsey, J. L.; Thompson, W. D.; and Evans, A. S. (1986). Methods in Observational Epidemiology. New York: Oxford University Press.

Pagano, M., and Gauvreau, K. (2000). Principles of Biostatistics, 2nd edition. Pacific Grove, CA: Duxbury.