Hypothesis Testing (Encyclopedia of Psychology)
The method psychologists employ to prove or disprove the validity of their hypotheses.
When psychologists engage in research, they generate specific questions called hypotheses. Research hypotheses are informed speculations about the likely results
|You conclude that the two groups differ so you reject the Null Hypothesis.||You conclude that the two groups do not differ so you fail to reject the Null Hypothesis.|
|Two groups really do differ||You correctly rejected the Null Hypothesis. You made a good decision.||You made a Type II error. You should have said there is a difference, but you made a mistake and said there wasn't.|
|Two groups really do not differ||You made a Type I error. You said that the groups are different, but you made a mistake.||You correctly failed to reject the Null Hypothesis. You said that the groups are not different, and...|
(The entire section is 580 words.)
Show us the love and view this for free! Use the facebook like button, or any other share button on this page, and get this content free!free!
Want to Read More?
Subscribe now to read the rest of this article. Plus get complete access to 30,000+ study guides!
Hypothesis Testing (Encyclopedia of Management)
Social science research, and by extension business research, uses a number of different approaches to study a variety of issues. This research may be a very informal, simple process or it may be a formal, somewhat sophisticated process. Regardless of the type of process, all research begins with a generalized idea in the form of a research question or a hypothesis. A research question usually is posed in the beginning of a research effort or in a specific area of study that has had little formal research. A research question may take the form of a basic question about some issue or phenomena or a question about the relationship between two or more variables. For example, a research question might be: "Do flexible work hours improve employee productivity?" Another question might be: "How do flexible hours influence employees' work?"
A hypothesis differs from a research question; it is more specific and makes a prediction. It is a tentative statement about the relationship between two or more variables. The major difference between a research question and a hypothesis is that a hypothesis predicts an experimental outcome. For example, a hypothesis might state: "There is a positive relationship between the availability of flexible work hours and employee productivity."
Hypotheses provide the following benefits:
- They determine the focus and direction for a research effort.
- Their development forces the researcher to clearly state the purpose of the research activity.
- They determine what variables will not be considered in a study, as well as those that will be considered.
- They require the researcher to have an operational definition of the variables of interest.
The worth of a hypothesis often depends on the researcher's skills. Since the hypothesis is the basis of a research study, it is necessary for the hypothesis be developed with a great deal of thought and contemplation. There are basic criteria to consider when developing a hypothesis, in order to ensure that it meets the needs of the study and the researcher. A good hypothesis should:
- Have logical consistency. Based on the current research literature and knowledge base, does this hypothesis make sense?
- Be in step with the current literature and/or provide a good basis for any differences. Though it does not have to support the current body of literature, it is necessary to provide a good rationale for stepping away from the mainstream.
- Be testable. If one cannot design the means to conduct the research, the hypothesis means nothing.
- Be stated in clear and simple terms in order to reduce confusion.
HYPOTHESIS TESTING PROCESS
Hypothesis testing is a systematic method used to evaluate data and aid the decision-making process. Following is a typical series of steps involved in hypothesis testing:
- State the hypotheses of interest
- Determine the appropriate test statistic
- Specify the level of statistical significance
- Determine the decision rule for rejecting or not rejecting the null hypothesis
- Collect the data and perform the needed calculations
- Decide to reject or not reject the null hypothesis
Each step in the process will be discussed in detail, and an example will follow the discussion of the steps.
STATING THE HYPOTHESES.
A research study includes at least two hypotheseshe null hypothesis and the alternative hypothesis. The hypothesis being tested is referred to as the null hypothesis and it is designated as H It also is referred to as the hypothesis of no difference and should include a statement of equality (=, or £). The alternative hypothesis presents the alternative to the null and includes a statement of inequality (. The null hypothesis and the alternative hypothesis are complementary.
The null hypothesis is the statement that is believed to be correct throughout the analysis, and it is the null hypothesis upon which the analysis is based. For example, the null hypothesis might state that the average age of entering college freshmen is 21 years.
H0 The average age of entering college freshman = 21 years
If the data one collects and analyzes indicates that the average age of entering college freshmen is greater than or less than 21 years, the null hypothesis is rejected. In this case the alternative hypothesis could be stated in the following three ways: (1) the average age of entering college freshman is not 21 years (the average age of entering college freshmen 21); (2) the average age of entering college freshman is less than 21 years (the average age of entering college freshmen < 21); or (3) the average age of entering college freshman is greater than 21 years (the average age of entering college freshmen > 21 years).
The choice of which alternative hypothesis to use is generally determined by the study's objective. The preceding second and third examples of alternative hypotheses involve the use of a "one-tailed" statistical test. This is referred to as "one-tailed" because a direction (greater than [>] or less than [<]) is implied in the statement. The first example represents a "two-tailed" test. There is inequality expressed (age 21 years), but the inequality does not imply direction. One-tailed tests are used more often in management and marketing research because there usually is a need to imply a specific direction in the outcome. For example, it is more likely that a researcher would want to know if Product A performed better than Product B (Product A performance > Product B performance), or vice versa (Product A performance < Product B performance), rather than whether Product A performed differently than Product B (Product A performance Product B performance). Additionally, more useful information is gained by knowing that employees who work from 7:00 a.m. to 4:00 p.m. are more productive than those who work from 3:00 p.m. to 12:00 a.m. (early shift employee production > late shift employee production), rather than simply knowing that these employees have different levels of productivity (early shift employee production late shift employee production).
Both the alternative and the null hypotheses must be determined and stated prior to the collection of data. Before the alternative and null hypotheses can be formulated it is necessary to decide on the desired or expected conclusion of the research. Generally, the desired conclusion of the study is stated in the alternative hypothesis. This is true as long as the null hypothesis can include a statement of equality. For example, suppose that a researcher is interested in exploring the effects of amount of study time on tests scores. The researcher believes that students who study longer perform better on tests. Specifically, the research suggests that students who spend four hours studying for an exam will get a better score than those who study two hours. In this case the hypotheses might be:
H0 The average test scores of students who study 4 hours for the test = the average test scores of those who study 2 hours.
H1 The average test score of students who study 4 hours for the test < the average test scores of those who study 2 hours.
As a result of the statistical analysis, the null hypothesis can be rejected or not rejected. As a principle of rigorous scientific method, this subtle but important point means that the null hypothesis cannot be accepted. If the null is rejected, the alternative hypothesis can be accepted; however, if the null is not rejected, we can't conclude that the null hypothesis is true. The rationale is that evidence that supports a hypothesis is not conclusive, but evidence that negates a hypothesis is ample to discredit a hypothesis. The analysis of study time and test scores provides an example. If the results of one study indicate that the test scores of students who study 4 hours are significantly better than the test scores of students who study two hours, the null hypothesis can be rejected because the researcher has found one case when the null is not true. However, if the results of the study indicate that the test scores of those who study 4 hours are not significantly better than those who study 2 hours, the null hypothesis cannot be rejected. One also cannot conclude that the null hypothesis is accepted because these results are only one set of score comparisons. Just because the null hypothesis is true in one situation does not mean it is always true.
DETERMINING THE APPROPRIATE TEST STATISTIC.
The appropriate test statistic (the statistic to be used in statistical hypothesis testing) is based on various characteristics of the sample population of interest, including sample size and distribution. The test statistic can assume many numerical values. Since the value of the test statistic has a significant effect on the decision, one must use the appropriate statistic in order to obtain meaningful results. Most test statistics follow this general pattern:
For example, the appropriate statistic to use when testing a hypothesis about a population means is:
In this formula Z = test statistic, X= mean of the sample, μ = mean of the population, = standard deviation of the sample, and η = number in the sample.
SPECIFYING THE STATISTICAL SIGNIFICANCE SEVEL.
As previously noted, one can reject a null hypothesis or fail to reject a null hypothesis. A null hypothesis that is rejected may, in reality, be true or false. Additionally, a null hypothesis that fails to be rejected may, in reality, be true or false. The outcome that a researcher desires is to reject a false null hypothesis or to fail to reject a true null hypothesis. However, there always is the possibility of rejecting a true hypothesis or failing to reject a false hypothesis.
Rejecting a null hypothesis that is true is called a Type I error and failing to reject a false null hypothesis is called a Type II error. The probability of committing a Type I error is termed α and the probability of committing a Type II error is termed β. As the value of α increases, the probability of committing a Type I error increases. As the value of β increases, the probability of committing a Type II error increases. While one would like to decrease the probability of committing of both types of errors, the reduction of α results in the increase of β and vice versa. The best way to reduce the probability of decreasing both types of error is to increase sample size.
The probability of committing a Type I error, α, is called the level of significance. Before data is collected one must specify a level of significance, or the probability of committing a Type I error (rejecting a true null hypothesis). There is an inverse relationship between a researcher's desire to avoid making a Type I error and the selected value of α; if not making the error is particularly important, a low probability of making the error is sought. The greater the desire is to not reject a true null hypothesis, the lower the selected value of α. In theory, the value of α can be any value between 0 and 1. However, the most common values used in social science research are .05, .01, and .001, which respectively correspond to the levels of 95 percent, 99 percent, and 99.9 percent likelihood that a Type I error is not being made. The tradeoff for choosing a higher level of certainty (significance) is that it will take much stronger statistical evidence to ever reject the null hypothesis.
DETERMINING THE DECISION RULE.
Before data are collected and analyzed it is necessary to determine under what circumstances the null hypothesis will be rejected or fail to be rejected. The decision rule can be stated in terms of the computed test statistic, or in probabilistic terms. The same decision will be reached regardless of which method is chosen.
COLLECTING THE DATA AND PERFORMING THE CALCULATIONS.
The method of data collection is determined early in the research process. Once a research question is determined, one must make decisions regarding what type of data is needed and how the data will be collected. This decision establishes the bases for how the data will be analyzed. One should use only approved research methods for collecting and analyzing data.
DECIDING WHETHER TO REJECT THE NULL HYPOTHESIS.
This step involves the application of the decision rule. The decision rule allows one to reject or fail to reject the null hypothesis. If one rejects the null hypothesis, the alternative hypothesis can be accepted. However, as discussed earlier, if one fails to reject the null he or she can only suggest that the null may be true.
XYZ Corporation is a company that is focused on a stable workforce that has very little turnover. XYZ has been in business for 50 years and has more than 10,000 employees. The company has always promoted the idea that its employees stay with them for a very long time, and it has used the following line in its recruitment brochures: "The average tenure of our employees is 20 years." Since XYZ isn't quite sure if that statement is still true, a random sample of 100 employees is taken and the average age turns out to be 19 years with a standard deviation of 2 years. Can XYZ continue to make its claim, or does it need to make a change?
- State the hypotheses.
H0 = 20 years
H1 20 years
- Determine the test statistic. Since we are testing a population mean that is normally distributed, the appropriate test statistic is:
- Specify the significance level. Since the firm would like to keep its present message to new recruits, it selects a fairly weak significance level (α = .05). Since this is a two-tailed test, half of the alpha will be assigned to each tail of the distribution. In this situation the critical values of Z = +1.96 and .96.
- State the decision rule. If the computed value of Z is greater than or equal to +1.96 or less than or equal to .96, the null hypothesis is rejected.
- Reject or fail to reject the null. Since 2.5 is greater than 1.96, the null is rejected. The mean tenure is not 20 years, therefore XYZ needs to change its statement.
Anderson, David R., Dennis J. Sweeney, and Thomas A. Williams. Statistics for Business and Economics. 9th ed. Mason, OH: South-Western College Publishing, 2004.
Kerlinger, Fred N., and Howard B. Lee. Foundations of Behavioral Research. 4th ed. Fort Worth, TX: Harcourt College Publishers, 2000.
Pedhazur, Elazar J., and Liora Pedhazur Schmelkin. Measurement, Design, and Analysis: An Integrated Approach. Hillsdale, NJ: Lawrence Erlbaum Associates, 1991.
Schwab, Donald P. Research Methods for Organizational Studies. Mahwah, NJ: Lawrence Erlbaum Associates, 1999.
Hypothesis Testing (Encyclopedia of Business)
Hypothesis testing, the backbone of the scientific method, is a methodology for evaluating a business or economic theory. A hypothesis is a proposition or statement about the worlderived from any source, from whim or fancy, from accumulated knowledge, from dominant or heretical ideas, from prejudices, or from guesseshat is capable of being confronted with facts and is thus capable of being refuted or confirmed by those facts. In any field of science, from physics and chemistry to economics and sociology, practitioners often pursue questions using this method, generally referred to as the scientific method. The overarching process involves the formulation of hypotheses (statements), testing them against the facts, and rejecting those statements that are refuted or reformulating them in accordance with information derived from the testing. Business and economic applications of hypothesis testing include researching consumer behavior, formulating economic models, and evaluating corporate strategies, among many others.
HYPOTHESIS TESTING IN THE NATURAL SCIENCES
In many of the natural sciences, hypothesis testing takes place in the context of controlled laboratory experiments (so as to isolate a particular phenomenon or causal effect). For example, a medical researcher may wish to test the proposition that smoking causes lung cancer. In order to properly test the hypothesis, he or she might try to look at identical individuals in identical environments, with the only difference (assuming that all other factors could be controlled) being that one group smokes while the other group (the control group) does not. If the group that smoked eventually developed lung cancer, the researcher could conclude that his or her hypothesis was confirmed.
HYPOTHESIS TESTING IN THE SOCIAL SCIENCES
By contrast, in the social sciences, investigators often resort to secondary analysis; statistical methods are employed to analyze data because social phenomenon are rarely, if ever, amenable to laboratory-type experiments. Hypotheses are tested using statistical techniques in order to infer conclusions about a population from information obtained from a subset (or sample) of that population. Statistical inference (based on laws of probability) is then used to test whether a particular observed phenomenon is due to chance.
For example, we might wish to test whether the observation that men's wages on average are significantly higher than women's wages is not a random event characteristic of a particular sample of the men and women we surveyed. To test this we would formulate a null hypothesis that the true mean wages of men and women are equal. To err on the conservative side, null hypotheses generally assume there is no relationship between the factors being observed; the logic being that it is a lesser mistake to fail to find a relationship than to assert falsely that there is one. In our example, this would mean we assume there is no difference in pay attributable to sex. If the statistical evidence is strong enough, however, we reject the null hypothesis and accept the alternativehat the differences are not due to chance.
We could then discuss why this might be the case. This is where the controversy would arise. Hypothesis testing may allow the researcher to find a connection between observed phenomena, but a simple correlation does not necessarily identify or explain the causes or dynamics of that relationship. In other words, it would be premature to conclude that sex discrimination is the cause, even if we have concluded that wages are materially different. To test the discrimination theory, a new hypothesisnd a new means of testing itould have to be devised. Of course it is one thing to posit a hypothesis and quite another to devise a meaningful test of it. In this example, while it may be easy enough to prove a correlation between sex and pay, it would be much more complicated to demonstrate how the difference is put into effect; the project would likely involve a series of additional hypotheses relating to specific, measurable indicators of discrimination and other factors that could affect wages.
In econometrics, the branch of economic statistics that most often deals with hypothesis testing, an investigator might assume some relationship between variables for purposes of statistical testing. For example, a tax on corporate income might be posited to be passed on to consumers in the form of higher prices. One way of testing this hypothesis would be to test the hypothesis that prices are correlated with the tax. Other common hypotheses tested are that the quantity of a good demanded depends on the price of the good. Another repeatedly confirmed hypothesis is that variation in the money supply in an economy is associated with variation in the price level of the economy. In all of these cases correlation is easily shownhat is to say, all of these hypotheses have been largely confirmed. Again, the drawback in this type of analysis is that while hypothesis tests can establish correlation between variables, they cannot explain how and why systems function as they do. For example, does a change in the money supply lead to a change in prices? Or, conversely, does a change in prices lead to a change in the money supply? Does some other variable, or variables, lead to a change in both the money supply and the price level? Differing reasonable explanations abound. Thus, while certain confirmed hypotheses might exhibit substantial predictive power, in order to gain a more complete understanding of any subject, empirical testing must be embedded in a larger context of historical and theoretical reasoning about the world.
Much of the research in the social sciences (and various business applications) relies on statistical methods that allow the researcher to make general statements about a population from information derived from a sample. These statistical methods then allow the researcher to separate the effects of systematic variation of a variable from mere chance effects. As mentioned, this technique is especially useful in the social sciences because many phenomena cannot be isolated or controlled in a laboratory-type setting, as in the physical sciences. Many tests of economic hypotheses, for example, take the form of testing parameters of linear regression models. To illustrate, suppose an economic relation is hypothesized to take the form
where Y is supposed to represent observations of the dependent variable and X is supposed to represent observations of explanatory (or causal) variables. The quantity B is a coefficient that expresses the relationship between the independent variables and the hypothesized dependent variables, while e is a vector of residual terms that are assumed to be independent of one another (or random). Hypothesis tests could then be formulated by placing restrictions on one or more of the coefficients and testing whether certain variables (alone or in concert) have an effect on Y. Thus, one might hypothesize that consumption expenditures are related to income, or wages, wealth, and certain other variables. We could then posit the null hypothesis that, for example, consumption is not a function of income, holding other variables constant (i.e., that the coefficient for B is zero) Then, if the null hypothesis is rejected, that would imply that a measurable portion of the variation in consumption expenditures (captured in the parameter B) is explained by the variation in income.
USES AND ABUSES OF HYPOTHESIS TESTING
In spite of claims that scientists are passive recipients of facts about the world, the questioning of how and which hypotheses are tested is, in fact, a complicated social process that involves issues of a particular society's collective or dominant valuesur perceptions of the world shape our understanding of the world and our understanding of the world contributes to how we in turn act upon our world. In other words, the type of questions that are asked is itself a product of many factors, including the inherited historical knowledge, dominant values, and ideology of a particular society. Without doubt, this knowledge influences the society's technological and social trajectory.
But the dominant notion of the scientist is one of the neutral observer trying to make sense of a complicated world. In their labors, scientists obtain information about the world and formulate propositions in the form of refutable hypotheses. The goal is to find regularities concealed by random disturbances. In this way, primary causal relations may be separated from those phenomena that are generated by chance. The accepted hypotheses are then accretions to scientific knowledge. Often the people who test hypotheses are separate from those who think about and interpret the results of empirical tests. Thus, for example, one often finds theoretical physicists and theoretical economists as distinct from applied economists and applied physicists. In any case, it is the facts that speak to the observer. Not surprisingly, then, one of the most fundamental notions of positivist science is the separation of analytical (often called metaphysical or logical) arguments (not directly observable) from empirical (by definition testable) statements. One of the crudest versions of this method elevates prediction as the best way to judge the validity of a theory, regardless of its assumptions. Whether prediction is the most desirable test of the validity of any theory is not, of course, a settled issue.
Thus, one of the philosophical tenets of the method of positivist science puts forward a view of the scientific investigator as the neutral observer of historical and physical phenomena, one who assumes the role of selecting and testing facts. Of course, facts always require interpretation. Indeed, some would argue that we can't separate science from ideology, as everyone speaks from some point of view, but we can openly recognize perspectives for what they are. In this sense it may be inaccurate to view science as a strictly neutral observation of the world, particularly in fields where there are many competing interpretations of the facts; of course, this doesn't mean that basic and uncontested scientific ideas need to be scrutinized by every lay observer.
Within any particular natural or social science, hypotheses that have been confirmed (by replication and verification) and accepted are often elevated to the status of laws. Laws are valued because they have substantial predictive power and because they can account for certain regularities in nature or society. These laws, however, do not explain the regularities, the facts; they only describe them. In other words, to explain why a phenomena occurs we turn to a larger context, typically to abstract forces for which often no direct observational evidence exists but which may be discerned by the array of phenomena generated by these forces. For example, one cannot observe gravity directly but one can observe (measure, test etc.) the phenomena that the force of gravity generates in different contexts (e.g., a person jumping off a building will fall at a particular speed, the moon revolving around earth will travel a particular path at a particular speed). The strength of hypothesis testing lies in its ability to glean patterns in an apparently chaotic world, thereby directing the researcher towards which phenomena to look for and what questions to ask.
[John A. Sarich]