Statistical computing involves the interaction of virtually every aspect of statistical theory and practice with computer science. In many ways, statistical computing forms the boundary between the two disciplines. There are three major approaches to statistical computing programs: Single programs, statistical systems or large package programs, and collections of statistical algorithms. Although computers can reduce the errors that may be introduced into calculations by humans, computers also have their own set of associated errors. A wide variety of software application programs are available for performing statistical calculations. Programs run the gamut from blending with relational database software on one end of the spectrum to blending with mathematical software at the other end.
Except in class exercises or for very simple statistical calculations with a small sample size, virtually no one today performs statistical analyses by hand. Certainly the classroom experience of learning to calculate by hand is invaluable for understanding how statistics work, but the truth is that computers are so much better than human beings at the actual processing and calculating tasks involved in performing statistical techniques. Human beings still need to design the underlying experiment, determine what statistical technique used to analyze the data, and interpret the results. However, computers are better than humans at the processing step -- taking the inputs of raw data and turning them into interpretable results. Properly programmed and functioning computers do not reverse numbers or make arithmetic errors as humans are wont to do. In addition, computers excel at processing large amounts of data quickly in a way no human could ever do.
Statistical computing involves the interaction of virtually every aspect of statistical theory and practice and well as nearly every aspect of computer science. Both statistics and computer science are fundamental to all science and together provide complementary tools for scientific endeavors. Statistics is concerned with the accumulation of data, optimal extraction of information from data, and how determining inferences can be made from data in order to extend knowledge. To do these things, statistics often involves processing or combining data either numerically or symbolically, a task at which computer science excels. Computer science deals with ways to optimize these processes, representing information and knowledge in useful ways, and understanding the limits of what can be computed.
The Approaches to Statistical Computing Programs
There are three major approaches to statistical computing programs.
- Single programs (such as the Biomedical Data Program (BMDP) developed at the University of California at Los Angeles) comprise collections of statistical programs that require the user to do little more than input and output the data in order to run statistical analyses and acquire usable results.
- A second approach is the statistical system or large package program. These are very complex programs that allow users to perform a wide range of statistical analyses by giving the computer instructions in the special language of the system. The Statistical Program for the Social Sciences (SPSS) package and the British General Statistical Program (GENSTAT) program are examples of this category of programs. These programs can be more useful to frequent users who fully understand the system's language and have a good understanding of the system's strengths and weaknesses. However, these requirements also mean that this category of program can be difficult to use for those who do not use it regularly.
- The third approach to statistical computing is the development of a collection of statistical algorithms (i.e., sequences of well-defined, unambiguous, simple instructions in the form of mathematical and logical procedures that inform a computer how to solve a problem) which are combined into programs. If a convenient method can be found to do this, the algorithmic approach can be very flexible in meeting the needs of the user.
Although computers can reduce the errors that can be introduced into calculations by humans, they have their own set of associated errors that may be introduced into the calculations. Although computer storage capabilities are increasing and become less expensive, in the end, computers still have limited storage space. To help maximize the use of this space, computers typically store only the most significant digits of data. The actual number of digits that can be stored is determined by the word length of the particular computer. For example, although many people are taught in school that the mathematical concept pi (the ratio of the circumference to the square of the diameter of a circle) is equal to 3.14 or 3.141 or 3.1416, it is actually an infinite decimal which cannot be computed exactly. Therefore, these numbers are merely approximations of the value of pi. To store the value of pi, therefore, a computer needs to truncate the number at some point, rounding it appropriately. In many instances, how this is done is not of particular importance to the outcome of the calculations in which it is used. In other instances, however, it is and can throw off the entire calculation with the rounding error being magnified in subsequent computations.
Because of rounding error and other factors, computer results frequently contain some error. In most cases, this error is not large and the results are good enough for their purposes. However, being aware of the types of error than can appear in computations and how they are caused can help the user of statistical programs to anticipate potential problems and be better prepared to interpret the results.
Three Types of Computational Error
In general, there are three types of errors than can affect the results of computations:
- Blunders are gross errors or mistakes that are easily correctible if detected. Examples of blunders include if there was a "bug" in a computer program or if incorrect data were input into the system. The other two types of errors, however, are less easily corrected.
- Errors due to the use of an approximate computation method occur when one uses a function or process that approximates a true function or process. For example, evaluating the first n terms in a series expansion of a function may yield results that are only approximations even if the calculations are carried out exactly. Errors due to approximation imposed by the computer result from the way that the computer performs its calculations. These include rounding errors and chopping of fractions in floating point operations. This...
(The entire section is 2968 words.)