In business, one needs to examine data from many sources in order to determine the best strategy for success. Descriptive statistics offers numerous techniques for organizing numerical data so that they can be presented in a form that humans can easily assimilate. Common methods for organizing and arranging data include the stem and leaf plot and the box and whiskers plot. In addition, the stem and leaf plot can be used in the development of one of the most frequently used methods for graphically depicting data: The frequency distribution. Often, the midpoints on a frequency distribution are connected by a line called a frequency polygon. These graphs may also be translated into ogives, or cumulative frequency polygons. Quality control utilizes statistical tools to increase the level of quality and reduce defects and waste. Some of the descriptive statistics used in quality control include Pareto charts, scatter plots, and Shewhart control charts.
Human beings are constantly being bombarded by data. In business, one needs to examine data from many sources in order to determine the best strategy for success: customer feedback, competitors' actions, marketplace trends. Even within these categories, data need to be organized, described, and presented in ways that help human beings comprehend them and use them to solve problems and make decisions. For example, if the marketing department wanted to know customers' reactions to a proposed new widget design before the company decided whether or not to introduce the new product to the market at large, they might let a sample of potential customers use the widget and then complete a survey regarding their reactions. Although when organized and analyzed, these data could be invaluable inputs for making a decision, a pile of 1,000 surveys sitting on the corner of someone's desk is not. To help solve this problem and to prepare the data for further analysis, the amount of data to be handled are frequently reduced through any one of a number of graphing techniques.
Techniques for Organizing Data
Descriptive statistics offers numerous techniques for organizing numerical data so that they can be presented in a form that humans can easily assimilate. Take, for example, the following collection of 50 data points:
37 12 41 41 74 60 30 28 52 65 54 91 42 37 43 57 38 6 53 30 56 34 39 94 61 65 48 59 27 2 75 72 83 30 71 20 25 28 46 23 65 13 15 7 13 14 46 58 34 20
The numbers are in random order and it is difficult to tell at a glance whether or not the number 53 is included in the set. If these were raw data from potential customers indicating their reactions and ratings to a new widget design on a 100 point scale, it would be extremely difficult to tell whether or not the new design was successful. One could reduce the confusion somewhat by arranging the raw data in numerical order:
2 6 7 12 13 13 14 15 20 20 23 25 27 28 28 30 30 30 34 34 37 37 38 39 41 41 42 43 46 46 48 52 53 54 56 57 58 59 60 61 65 65 71 72 74 75 75 83 91 94
It is now much easier to see that the number 53 is included in the data set. However, it is still not readily apparent whether or not the group of people interviewed liked the new widget. One way to organize the data so that the answer to this question is clearer is to group them into intervals and graph the results in a frequency histogram. A histogram is a type of vertical bar chart that graphs frequencies of objects within various classes on the y-axis against the classes on the x-axis. Frequencies are graphed as a series of rectangles. One can, of course, use whatever size intervals are convenient. For example, if the data set ran from zero to 1,000, one might choose to clump the data in groups of 100 (i.e., 1-100, 101-200, etc.). If the range of data (i.e., the difference between the highest and the lowest values in the data set) is smaller, smaller intervals would be more appropriate.
The most basic set of tools used in descriptive statistics comprises various graphing techniques that help organize and summarize data so that they are more easily comprehendible. One common way of arranging data is through a stem and leaf plot. This is a graphing technique in which individual data points are broken into the rightmost units ("leaves") and the leftmost units ("stems"). For example, the number 42 would have a stem of 4 and a leaf of 2; the number 47 would have a stem of 4 and a leaf of 7. Using this technique, the data on customer response to the new widget design would look like this:
Stem Leaf 0 2, 6, 7 1 2, 3, 3, 4, 5 2 0, 0, 3, 5, 7, 8, 8 3 0, 0, 0, 4, 4, 7, 7, 8, 9 4 1, 1, 2, 3, 6, 6, 8 5 2, 3, 4, 6, 7, 8, 9 6 0, 1, 5, 5 7 1, 2, 4, 5, 5 8 3 9 1, 4
The stem and leaf plot gives one a better idea of the distribution of data. For example, we can see that the majority of people rating the new design rated it between 20 and 59. However, one can also easily see in this plot that not everyone rated the new design in this interval and that there are extreme scores on each end of the distribution. What is not so easily seen in a stem and leaf plot is what the median (the middle value in the ordered data set) of the distribution is.
The median of the distribution can be readily seen through another data presentation technique: The box and whiskers plot (also called a candlestick chart). In this approach to graphing data, the upper and lower quartiles, the median, and the two extreme values of a distribution are used to summarize the data in a compact form. In the example of the widget rating data, the median is 41. The upper and lower quartiles (labeled Q1 and Q3 respectively in Figure 4) are found in the same way by finding the median of the lower and upper halves of the distributions. For the widget rating data, the lower quartile is 26 and the upper quartile is 59.5. The area between the upper and lower quartiles is enclosed with a rectangle on a number line and the position of the median is indicated within the rectangle. In addition, the end points of the data set (in the widget rating example, the lowest value is two and the highest value is 94) are also indicated on the number line and connected to the rectangle by lines.
The box and whiskers plot summarizes a number of characteristics of the data at a glance. One can tell by looking at the box and whiskers plot where the mid-point of the distribution is and where the bulk of the scores are (i.e., the 50 percent of the scores within the rectangle). In...
(The entire section is 2885 words.)