## Marketing (Statistics) & Applied Probability Models

(Research Starters)

The success of the marketing function within an organization is key to the success of the organization as a whole. Mathematical models can help marketers answer questions about marketplace needs and buyer behavior by providing a mathematical representation of the system or situation being studied. Despite the fact that the real world is an infinitely complex entity, models should still strive to be succinct and parsimonious. Empirical observations in combination with a review of the literature will give the analyst a good starting point from which to decide which variables to include or exclude in the model building process. There are a number of approaches available to help analysts determine which variables will be most predictive in a model. Marketing models can be used in a wide variety of situations. Three of these include the determination of customer lifetime value, optimizing sales force deployment, and optimizing the mix between expert judgment and statistical technique in building the best model for direct marketing.

Dr. Ruth A. Wienclaw holds a doctorate in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human/systems integration.

Keywords Buyer Behavior; Consumer; Customer Lifetime Value; Customer Relationship Management; Dependent Variable; Empirical; Forecasting; Marketing; Model; Probability; Regression; Stochastic; Variable

### Statistics: Marketing (Statistics)

### Overview

The success of the marketing function within an organization is key to the success of the organization as a whole. Marketing involves creating, communicating, and delivering value to consumers in ways that benefit the organization and its stakeholders. In addition, modern marketing theory emphasizes customer relationship management, the process of identifying prospective customers, acquiring data concerning these prospective and current customers, building relationships with customers, and influencing their perceptions of the organization and its products or services. Because of the importance and the complexity of these tasks, many marketing departments rely on the use of mathematical models to help them forecast buyer behavior under various sets of variables and "what if" scenarios. Marketing is concerned with both the description of actual, observed behavior and the prediction of future behavior. For example, one might be interested to know why customers prefer a widget over a gizmo. One might also be interested to know whether customers might prefer a gizmo if it were redesigned to improve certain characteristics. Mathematical models can help marketers answer these questions by providing a mathematical representation of the system or situation being studied.

Despite the fact that the real world is an infinitely complex entity, models should still strive to be succinct and parsimonious. The state of modeling science is such that it can only take into account a finite amount of variables. It is part of the modeler's task to determine which variables should and should not be included in the model-building process. In general, this means considering only those things that are relevant to the central research question. For example, if one is interested in predicting if a proposed turquoise widget will appeal to women, the question could be broken down according to infinite demographic variables such as age range, education, area of the country, national origin, and whether or not they were cat lovers. Although these variables may have some effect on prospective customers' decision to purchase the new widget, they are less relevant to the central question concerning women and the color turquoise and should, therefore, probably be eliminated from the model unless they are theorized to be central to the question at hand. A good model needs to have both fit (i.e., accurately models the real world situation) and robustness (i.e., accurately predicts future behavior). A model that attempts to consider all observations usually predicts poorly or yields predictions that are too ambiguous to be of much practical use. To achieve stronger predictions, one must often compromise and predict fewer situations. The best models make the appropriate compromise between predicting in all circumstances and predicting accurately. Anything else will not have much applicability. Two well-known and established statistical techniques used in mathematical modeling for market analysis are conjoint analysis and the Dirichlet model.

Empirical observations in combination with a review of the literature will give the analyst a good stating point from which to decide which variables to include or exclude in the model building process. Typically, theorists and researchers will have given serious thought to the relationship between variables, and the literature will express the state-of-the-art thinking about various aspects of the universe of data at which one is looking and will have built upon previous research and models. Similarly, empirical observations of trends and relationships by the marketing personnel or management in the organization can lead to other strong assumptions that are good points of departure for building a model of buyer behavior. However, strong assumptions alone are insufficient: To be useful, a model also needs to be testable. Models are only of use if they have validity and reliability. Validity means that the model accurately predicts what it is intended to predict. Reliability means that the model consistently measures what it is intended to measure. A model cannot be valid unless it is reliable.

There are a number of approaches to selecting variables for marketing models:

- Forward Selection Approach
- Backward Elimination Approach
- Stepwise Approach
- R-squared Approach
- Rule of Thumb Approach

### Forward Selection Approach

The forward selection approach adds variables to the model until no variable that adds significance to the model is not incorporated into it. As each variable is added to the model, a test statistic is first calculated to determine the variable's contribution to the model. If the test statistic is greater than a predetermined value, it is added to the model; if the test statistic is less than the predetermined value, it is not added. This process is completed for each potential variable of interest until the model is populated with all variables that make a significant contribution to the model.

### Backward Elimination Approach

Whereas the forward selection approach starts with an empty model and adds variables to it, the backward elimination approach starts with a model fully populated with all potential variables and then subtracts those that do not add significance to the model. As with the forward selection approach, the backward elimination approach ends with a model in which all the included variables have a test statistic that is greater than the predetermined value.

### Stepwise Approach

A third approach to selecting variables is the stepwise approach, which is a variation of the forward selection method. As opposed to the forward selection approach, however, in the stepwise approach, not all the variables already in the model necessarily remain in it. As in the forward selection approach, variables are added one at a time after being tested with the test statistic. However, the stepwise also examines the variables already included to delete any that do not have a test statistic value greater than the predetermined number.

### R-squared Approach

A fourth approach to variable selection is the *R*-squared approach. This approach is used to find multiple subsets of variables that best predict the dependent variable using an appropriate test statistic. This approach can be used to find the best one-variable model, the best two-variable model, and so forth.

### Rule-of-Thumb Approach

Another approach is the Rule-of-thumb approach. This approach selects the variables best associated to the dependent variable as determined by the Pearson correlation coefficient *r*. The variables are then ranked by their *r* values and the top *k* (as predetermined) ranked variables are included in the model. If a regression model with these variables demonstrates that all the variables have test statistic values greater than the predetermined value, then the set is determined to be the best.

Although it is important to choose the right variables for building a model, it is also important to remember that models are not set in stone. Model building is an iterative process and a model that does not meet the tests of validity and reliability can be refined to better model the real world. Indeed, the factors influencing customer behavior change over time and the marketing model needs to be flexible to reflect...

(The entire section is 3998 words.)