## Applied Regression & Analysis of Variance

(Research Starters)

Good business strategy is based on the rigorous analysis of empirical data. Applied statistics can help managers and other organizational decision makers develop better strategies and make better plans to help ensure the success of the organization. Two of the most commonly used statistical tools for giving decision makers the tools they need for these activities are regression analysis and analysis of variance. Regression analysis is a family of statistical techniques that is used to develop mathematical models that can be used for forecasting. Analysis of variance is a family of techniques used to analyze the joint and separate effects of multiple independent variables on a single dependent variable and to determine the statistical significance of the effect. Both regression analysis and analysis of variance can be invaluable in providing managers and other organizational decision makers with the information that they need to make informed decisions and develop plans to increase the effectiveness of the organization.

In business, a strategy is a plan of action to help the organization reach its goals and objectives. A good business strategy is based on the rigorous analysis of empirical data, including market needs and trends, competitor capabilities and offerings, and the organization's resources and abilities. Applied statistics help managers and other organizational decision makers develop better strategies and make better plans to help ensure the success of the organization. Two of the most commonly used statistical tools for giving decision makers the tools they need for these activities are regression analysis and analysis of variance (ANOVA). These tools are part of inferential statistics, a subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences from empirical data, including allowing decision makers to draw conclusions about a population from a sample.

### Regression Analysis

Knowing that there is a relationship between two variables does not always provide managers with sufficient information to make good decisions. In some situations one needs to be able to predict the value of one variable from knowledge of another variable. Regression analysis is a family of statistical techniques that is used to develop mathematical models that can be used for this purpose. Data are typically graphed on a scatter plot (a graph depicting pairs of points for two-variable numerical data) so that a line of best fit can be determined and used to predict the value of the dependent variable based on different values of the independent variable. A sample scatter plot with line of best fit is shown in Figure 1.

For example, a trainer might want to determine the optimal cost of running a seminar based on varying numbers of students. Although more students would mean more income, this situation would also mean more expenses such as costs for handouts, larger conference rooms, more trainers to run small group sessions, and so forth. With too few students, on the other hand, the training course would not pay for itself. A predictive model for cost versus number of trainees could be developed using data collected on these variables for a number of training courses. The slope of the line of best fit passing through the data could be mathematically calculated to determine the equation of the simple regression line. Trainers could then use this equation to determine optimal class sizes for training courses.

Simple linear regression can be invaluable for building models and predicting the value of one variable from the knowledge of the value of another variable. However, real-world business situations are often more complicated than this scenario. Multiple regression allows the prediction of the dependent variable by the use of more than one independent variable. For example, a marketer might want to predict the profitability of a new product line based on multiple factors such as price, product life, and packing. Such information could help the marketer determine whether investing in a new product line would be a sound investment based on the consideration of several variables. Similarly, an engineer might want to predict the life expectancy of a gizmo based on factors such as quality of the steel used in the frame, thickness of the housing, or other factors. This could give the engineer the information needed to make decisions for the most cost-effective design. This type of complex question can be answered using models built by multiple regression analysis.

### Analysis of Variance

Another commonly used statistical technique in business is analysis of variance. This is a family of techniques used to analyze the joint and separate effects of multiple independent variables on a single dependent variable and to determine the statistical significance of the effect. Analysis of variance examines two sources of variability: variability between groups and variability within groups.

- Variability between groups is the variation among the scores of subjects that are treated differently. For example, if one were trying to determine which of three batteries had the longest life, the between groups variability would look at the difference in battery life for batteries X vs. Y vs. Z.
- Within groups variability, the second type of variability of interest in analysis of variance, looks at the variation among the scores of subjects that are treated alike. Within groups variability would look at the variation of scores among all battery Xs tested, all battery Ys tested, and all battery Zs tested. Within groups variability is sometimes referred to as the "error term" because it is due to random error resulting from uncontrolled factors such as individual differences.

Analysis of variance is not a single technique but is actually a family of techniques used with experimental research. In the completely randomized design one-way analysis of variance, subjects are randomly assigned to treatments in a research design that contains only one independent variable with two or more treatment levels. For example, if one wanted to know which of several packaging options people preferred (e.g., the current packaging vs. one or more new packaging options), one could analyze the data with a one-way analysis of variance. Sometimes a second variable (referred to as a blocking variable) is used to control for confounding or extraneous variables that are not being tested in the research. Another design is two-way analysis of variance, which is used when the research design includes two or more independent variables (treatments) that the analyst desires to examine simultaneously. For example, the marketing department may want to know if there is a difference between the way that women and men react to the three packaging options for the widget. In addition to univariate analysis of variance, multivariate analysis of variance (MANOVA) techniques are available that allow the business analyst to test hypotheses on more complex problems involving the simultaneous effects of multiple independent variables on multiple dependent variables.

### Applications

Both regression analysis and analysis of variance can be invaluable in providing managers and other organizational decision makers with the information that they need to make informed decisions and to develop plans to increase...

(The entire section is 3258 words.)