Introduction to Nonparametric Methods
Not all real world situations yield interval- or ratio-level data that meet the assumptions made by parametric statistics about the distribution underlying the data. For such situations, nonparametric statistical techniques are often available that will enable one to do hypothesis testing and draw inferences from the data. Although nonparametric statistics have several advantages, however, they are not without disadvantages as well. Some of the more commonly used nonparametric statistics include nonparametric equivalents of the t-tests, coefficients of correlation, analysis of variance, as well as ways to test whether or not data are random. Although these tests are invaluable in certain situations, in the end, for situations where parametric techniques are available and the assumptions of these techniques are met, it is always preferable to use parametric analysis rather than nonparametric analysis.
Keywords Analysis of Variance; Correlation; Distribution; Hypothesis; Inferential Statistics; Nonparametric Statistics; Normal Distribution; Parametric Statistics; Population; Sample; Statistical Significance; Statistics; Variable
Statistics: Introduction to Nonparametric Methods
Most inferential statistics that are commonly used in applied settings are parametric and make certain assumptions about the parameters of the data and the distribution of the underlying population from which a sample is drawn. Commonly used inferential statistics such as t-tests, analyses of variance, and Pearson product moment correlation coefficients assume that the data being analyzed have been randomly selected from a population that has a normal distribution. In addition, parametric statistics require data that are interval or ratio in nature. That is, not only do the rank orders of the data have meaning (e.g., a value of 6 is greater than a value of 5) but the intervals between the values also have meaning. For example, it is clear that the difference between 1 gram of a chemical compound and 2 grams of a chemical compound is the same as the difference between 100 grams of the compound and 101 grams of the compound. These measurements have meaning because the weight scale has a true zero (i.e., we know what it means to have 0 grams of the compound) and the intervals between values is equal. On the other hand, it may not be quite as clear that the difference between 0 and 1 on a 100 point rating scale of quality of a widget is the same as the difference between 50 and 51 or between 98 and 99. These are value judgments and the scale may not have a true zero. For example, the scale may go from 1 to 100 and not include a 0. Similarly, even if the scale does start at 0, it may be difficult to define what this value means. Does a 0 on this scale differ significantly form a score of 1? Both scores indicate that the rater did not like the widget at all. However, one cannot tell from these scores why the rater assigned the values. Harvey may dislike on particular feature of the widget and, therefore, dismiss the entire product as unacceptable. Mathilde may like the gizmo better than the widget and, therefore, dislikes the widget in comparison so rates it low as a comparative rating rather than an absolute one. Even if the various points on the scale were well-defined, Harvey may think that a widget that he does not like should be given a rating of 20 while Mathilde may think that a widget that she does not like should be given a value of 2. Ratings are subjective and although numerical values may be assigned to them, these do not necessarily meet the requirement of parametric statistics that the data be at the interval or ratio level. Similarly, the real world puts practical limits on the quality of data that can be gathered. For example, if performing a quick survey in a shopping mall, one may only have time to ask people which of two soft drinks they prefer over the other. Although the resultant data do show that one is beverage is preferred over the other, they do not give the analyst any idea the degree to which this is true.
Uses of Nonparametric Statistics
Fortunately, one need not rely on parametric statistics or forego statistical analysis completely in situations where data do not meet the assumptions of parametric statistics. A number of nonparametric procedures are available that correspond to common tests used when the shape and parameters of a distribution are known. Nonparametric tests are so-called because they make no assumptions about the underlying distribution. To deal with data that are neither interval or ratio in nature or where the assumptions about the underlying distribution cannot be reasonably made, one needs to use nonparametric rather than parametric statistics. Nonparametric statistical techniques are used in situations where it is not possible to estimate or test the values of the parameters (e.g., mean, standard deviation) of the distribution or where the shape of the underlying distribution is unknown. In addition, nonparametric statistics often can be used in situations where only ordinal or ranked data are available (i.e., where the intervals between the data points may be uneven). Some nonparametric statistical techniques are available for use with nominal data. Although nonparametric statistical techniques are not as powerful as standard parametric statistics, nonparametric statistics do allow the analyst to derive meaningful information from a less than perfect data set.
Advantages to Nonparametric Statistics
Although these characteristics may seem to imply that nonparametric statistics are somehow inferior to parametric statistics, there are, in fact, several advantages to using nonparametric statistics.
- First, nonparametric statistics are less demanding about the characteristics of the data and their underlying distribution. Parametric statistics can only be validly used in situations where certain underlying assumptions are met, particularly if the sample sizes are small. For example, the one-sample Student's t-test requires that the underlying distribution for the population be normally distributed. Further, for independent samples, there is the additional requirement that the standard deviations be equal. If these assumptions are not true and the statistical technique is used, the results of the analysis cannot be trusted. The nonparametric equivalents of these tests, on the other hand, do not make these assumptions.
- Second, nonparametric statistics frequently require less time and effort to calculate, particularly for small sample sizes. For example, the nonparametric sign test provides the analyst with a quick test of whether or not two treatments are equally effective just by counting the number of times one treatment is better than the other.
- Third, nonparametric statistics can be used to provide some objectivity in situations were there is no reliable underlying scale for the data or where the use of parametric statistics would depend on an artificial metric. In fact, some nonparametric statistics are available for use with both nominal data (i.e., data that only indicate in which category or class a data point belongs but which indicates nothing about the relative intervals between data points) and ordinal data (i.e., using a scale on which data can be rank ordered but which indicates nothing about the intervals between the data points).
- Fourth, in some situations, even though sufficient interval or ratio data are available, they have not been randomly sampled from a larger population and there is no way to acquire a random sample. This often occurs in real world situations where analysis needs to be performed on existing data and the analyst or researcher cannot collect data that meet the requirements of parametric statistics. When this situation occurs, standard parametric statistics cannot be used. However, the data can sometimes be analyzed using nonparametric statistics.
- Finally, in some situations, nonparametric statistics offer the only choice for analyzing data.
Disadvantages of Nonparametric Statistics
(The entire section is 3589 words.)