## Experimental (Trial) Research

In experimental—or trial—research, the investigator directly controls selected conditions or characteristics of the environment and observes the effects these changes have on other features of the problem at hand in order to determine causal relations. Control over subject and sample selection, study design, environmental conditions, and other facets influencing empirical results are crucial to an understanding of causal relations. There are three types of studies categorized as experimental in social science research: quasi-experiments, randomized field trials, and single-subject studies. All of these methods, excluding several types of pre- and quasi-experiments, rely on comparisons of measurable effects. Experimental research in the social sciences has been critiqued and its suitability for social studies debated for centuries. Experimental research is not able to offer nuanced descriptions of phenomena, or to explore problems that are poorly understood, as in these cases adequate formulation of hypotheses is not possible. Both experimental and descriptive research is needed to formulate useful theoretical models, as each method has different purposes that complement each other.

Keywords Alternate Hypothesis; Construct Validity; Experimental Control; External Validity; Internal Validity; Newman-Pearson Hypothesis Testing; Null Hypothesis; Null Hypothesis Significance Testing; Predictive Validity; Randomized Field Trials; Scientific Method; Type I Error; Type II Error; Variables

### Overview

### Foundations of Experimental Research

Research, the process through which questions are articulated and investigated, generally takes one of two forms: descriptive or explanatory. Descriptive, or correlational, research aims to elucidate the matter of the inquiry as this naturally manifests in a given environment. In contrast, experimental research manipulates aspects of the environment or of the phenomenon in question in order to discover causal relationships (Marriott, 1998). Some argue experimental studies are underused in education (National Research Council, 2002).

In experimental—or trial—research, the investigator directly controls selected conditions or characteristics of the environment and observes the effects these changes have on other features of the problem at hand. The variables manipulated in an experiment are referred to as independent—though they are for the most part controlled by the researcher—because they in turn may control, or cause changes in other, *dependent* variables. A relationship between the dependent and independent variables is postulated by a hypothesis, the starting point for any experimental study (Marriott, 1998).

Hypotheses are propositions that articulate possible explanations of observed phenomena, and are categorized as either null or alternative. In experimental research, investigators first propose a null hypothesis that posits there is no causal relationship between the dependent and independent variables, and an alternate hypothesis that posits there is. They then use experiments to manipulate the independent variables and to document observations of the changing relation between dependent and independent variables.

The next step in experimental research is to statistically interpret collected data. Statistical analysis is used to probe the accumulated empirical observations for evidence that there is no cause-effect relationship between the variables of the study, or, that the null hypothesis cannot be rejected (Sax, 1968). If the analysis shows there *are* statistically significant variations in the data that point to the presence of a causal relation, the null hypothesis is rejected, and experimental results are used to propose, support, or refute scientific theories.

Theory formulation is the overarching aim of scientific endeavors, thus most researchers perform experiments hoping to reject the null hypothesis (Huysamen, 2005). Studies that do not reject the null hypothesis provide no information about any relations between variables, and can thus only be used to guide further research through providing examples of specific experimental designs that do not merit replication under specific conditions.

If researchers aim to show causal connections, one might wonder why the null, not alternate, hypothesis is the starting point of an experimental study. Research does not directly attempt to prove causal connections because scientific theories can never be definitively proven—they can only be verified to higher and higher degrees of accuracy. The rejection of a scientific theory is always theoretically possible because one cannot definitively say whether a counterexample might not be found. It would therefore be scientifically invalid to formulate a study around an attempted proof of the alternate hypothesis.

### Foundations of Hypothesis Testing

The scientific method, or the method of experimental research, relies on statistical hypothesis testing in its interpretation of results and in its determination of whether the null hypothesis may be rejected. Traditional statistical approaches include significance testing (Fisher, 1925) and Neyman-Pearson hypothesis testing (Neyman & Pearson, 1933). These approaches have been challenged since their original proposition, but continue to root thinking about experimental methods. The debate between the two theoretical approaches is still lively. Currently accepted statistical theories are "hybrids" of Fisher's and Neyman and Pearson's propositions (Balluerka et al., 2005).

Ronald Aylmer Fisher (1890–1962), a British evolutionary biologist and statistician who formalized the notion of null hypothesis, introduced significance testing (Fisher, 1925). Significance testing, as originally understood, was used only to make binary decisions about whether the null hypothesis could be rejected, and was thus critiqued for its one-dimensionality (Balluerka et al., 2005). Even in its modern multi-dimensional forms, significance testing remains a controversial procedure because it does not account for environmental and sample characteristics (Huysamen, 2005).

### Type I

Competing with Fisher's significance testing procedure in the early twentieth century was Neyman and Pearson's radically different approach, one grounded in probability theory (1933). Neyman-Pearson hypothesis testing examines empirical data by taking into account two types of errors: that of rejecting the null hypothesis when in fact it is valid (Type I error), and that of standing by the null hypothesis when in fact it should be rejected (Type II error). Neyman and Pearson criticized Fisher for only taking Type I errors into account, as in Fisher's significance testing experimental error is equated with false rejection of the null hypothesis.

Neyman and Pearson's hypothesis testing not only differentiates between experimental (Type I & II) errors, and thus offers possibilities for deeper analysis and for incorporating context, but also introduces several conceptual models that have guided modern research design since, such as power analysis (Huysamen, 2005). The power of an experiment is the probability that a Type II error will not occur (Ratnesar & Mackenzie, 2006). Power analysis is used to indicate how reliably Type II errors may be weeded out, as a function of sample size and other factors.

### Other Statistical Measurements

Fisher's and Neyman and Pearson's contrasting hypothesis testing methods have further inspired modern statistical measurement procedures such as point estimates, confidence intervals, and effect sizes. Point estimates and confidence intervals measure the extent to which an experiment is generalizable, while effect sizes specify the extent to which null hypotheses may be rejected (Balluerka et al., 2005). Modern procedures such as these that attempt to achieve high degrees of experimental generalizability and to provide accurate estimates of the strength of causal relationships have contributed significantly to the furthering of research in scientific methodology. They have enabled researchers to *control* independent variables in experimental designs to higher degrees of accuracy, thus improving the resolution of observed changes in the dependent variables.

### Control

Control is the essential assumption of experimental research. Control over subject and sample selection, study design, environmental conditions, and other facets influencing empirical results is crucial to an understanding of causal relations. Causality is reflected in the relationship between dependent and independent variables, thus experimental validity—the extent to which an experiment is under the control of the researcher—is a function of control over independent variables, over design parameters, and over observational accuracy in the measurement of dependent variables (see Figure 1).

**Figure 1:** Experimental control is affected by (a) the inputs of an experiment (independent variables), (b) the experimental design, and (c) observation of resulting changes in dependent variables.

DIAGRAM:

### Validity

Validity, the measurement of the extent of experimental control, takes on various forms, as experimental conclusions are influenced by several types of considerations. The four most commonly recognized types are:

• Predictive,

• Construct,

• Internal, and

• External.

Predictive validity refers to the extent to which results may be employed to predict possibilities of events or characteristics, while construct validity refers to the extent to which the experimental design is conceptually grounded in the philosophical foundations it claims (Beran, 2006).

Internal validity specifies the extent to which the dependent variables of a study are affected by changes in variables not controlled by the researcher. This form of validity may be increased by controlling experimental design such that no extraneous variables impact the dependent variables (Marriott, 1998). This is difficult to achieve in social science experimental research because "human interactions are often spontaneous, creative, and unpredictable" (Merrett, 2006, p. 146) and because as a result, extraneous variables in social science experiments can never be fully eliminated.

For example, the results of a study that shows a high degree of causality in the relation between smoking and student grades are affected by many other variables that must be taken into account: student emotional state, IQ, socioeconomic status, and other characteristics (Sax, 1968). The variables that are not answered for negatively impact the internal validity of an experiment. Consequently, internal validity can be increased through a careful and thorough control of extraneous variables (Campbell & Stanley, 1966). It is, however, impossible to account for *all* extraneous variables while experimentally probing the social world because an experiment cannot, due to computational limitations, keep track of all variables that might characterize a subject.

Ways of diminishing the effect of...

(The entire section is 4939 words.)