What are ability tests?

Quick Answer
Ability testing assesses the capabilities of people, typically measuring qualities such as intelligence. Exactly what is measured and how, as well as what test results mean, have been the subject of debate.
Expert Answers
enotes eNotes educator| Certified Educator

Whatever intelligence may be, the first scientific attempts to measure it were conducted by French psychologist and physician Alfred Binet. From 1894 until his death, Binet was director of the psychology laboratory at the Sorbonne. Between 1905 and 1911, Binet and his colleague Théodore Simon devised a series of tests that became the basis for tests in many areas. The Stanford, Herring, and Kuhlmann tests are among the revisions to Binet and Simon’s tests. Binet, unlike many of his contemporaries in psychology, was interested in how normal minds work, rather than in mental illness. It was his goal to discover inherent intelligence, apart from any educational influence.

Binet came to develop his tests through observation of his daughters. He was interested in how they solved problems that he set for them. Binet noted the existence of individual differences and the fact that not all thought processes use the same operational path. Binet argued that lack of ability in specific fields was not a mental illness. There were also, he noted, different types of memory. This discovery led to his work with Simon on achievement levels for “normal” children.

Binet’s first test, carried out in 1905, asked children to follow commands, copy patterns, name objects, and put things in order or arrange them properly. He administered the test to students in Paris. His standard was based on his data. Thus, if 70 percent of a certain age group succeeded on a given task, those who passed at that level were at that mental age level. It was Binet who introduced the term “intelligence quotient,” or IQ. IQ is the ratio of mental age to chronological age, with 100 being average. For example, an eight-year-old who succeeds on the ten-year-olds’ test would have an IQ of 10/8 100, or 125. Soon there was a widespread enthusiasm for testing and finding IQ scores. A number of measures were introduced. The US Army used tests to sort out recruits in World War I. The tests assessed general knowledge rather than ability on specific tasks.

Binet’s tests required modifications. The first, and perhaps most famous, was the Stanford-Binet test, developed in 1916 by Lewis Terman. It was immediately put to use by various educational, government, and other agencies. This test is mainly based on verbal ability and uses an IQ. Terman worked to overcome the limitations of the age-scale principle of testing. He wanted to measure the full range of intelligence. There were two major shortcomings of Binet’s scales in measuring adult intelligence. First, an older person’s score became meaningless when divided by his or her chronological age. Terman assigned the chronological age of fifteen to everyone over sixteen. Another major defect in Binet’s scales was the absence of test items to test and measure high intelligence. Terman added such items, assigning them mental age levels up to twenty-two. This enabled him to measure IQs of older children and young adults.

There were additional revisions of the Stanford-Binet test. In 1937, for example, Terman and Maude Merrill published a revision of the test based on the same principles as the 1916 examination. However, they improved the selection of items and method of standardization. Merrill published another revision in 1959. These revisions have found wide acceptance, also serving as models for other individual IQ tests and as a means for checking their scales.

The Wechsler scale, introduced in 1939, includes both verbal and performance measures. These scores compare an individual’s intelligence with those of others of the same age to yield an IQ score. The Wechsler-Bellevue adult scale uses a derived IQ to measure the intelligence of people between the ages of seven and seventy, comparing each person’s scores with standards for his or her age group. Wechsler produced two other scales, the Wechsler Intelligence Scale for Children, published in 1949, designed for children age five to fifteen, and the Wechsler Adult Intelligence Scale, published in 1955, for people from sixteen to sixty-four, including a special standardization for people age sixty to seventy-five.

Originally, IQ tests were individual tests, not group tests. However, as the military and other large organizations became involved in testing, large-scale tests were given. Individual tests tend to be more accurate, because an individual examiner is more likely to note the mood of a test taker in a one-to-one setting than in the more typical group setting. Individual tests are more likely to be administered to those who are thought to be either gifted or intellectually disabled. Group tests are more common in educational and military settings. Originally, all intelligence tests were individual tests, meaning that they were given in a one-to-one situation.

There is a good deal of dispute regarding the nature of intelligence and whether it can be measured in a quantitative fashion. Additionally, since the 1930s, there have been a number of virulent disputes regarding the role of genetics and environment in determining IQ, often termed the nature-nurture debate. Most psychologists concede that because environments are never uniform and the expression of genes is elastic, the argument for one or the other element as the sole determination of intelligence is somewhat flawed. Thus, intelligence, whatever it may be, is a function of both nature and nurture, of environment and genetic makeup.

Twin studies estimating environmental effects put genetic factors pertaining to “intelligence” at somewhere below 50 percent. However, wide variation exists according to the particular characteristic of intelligence under study. Indeed, later views of intelligence hold that many different abilities make up intelligence. The question for those who seek to measure intelligence, the process of psychometrics, is how to measure specific and general intelligence. Researchers note that there are many skills involved in both academic and professional success. For example, spatial intelligence is related to success in mathematics, science, engineering, architecture, and related fields, while it is not as important in literature or music


A number of theories of intelligence exist: psychological measurement, often called psychometrics; cognitive psychology, the merger of cognitive psychology with conceptualism; and biologic science, which considers the neural bases of intelligence. Psychometric theories have been most concerned with the quantification of intelligence and its parts. Psychometricians generally seek to understand the structure of intelligence, that is, the forms it may take and the relationship between any parts it may have. These theories are tested through paper-and-pencil tests. These tests include analogies, classifications, and series completions.

The psychological model on which these tests are based states that intelligence is made up of abilities that mental tests measure. Each test score is based on a weighted composite of scores taken from the underlying abilities. The mathematical model is additive and assumes that less of one type of ability can be compensated for by more of another ability.

Charles Spearman, who put forth the first psychometric theory, published his first major article on intelligence in 1904. Spearman noted that people who do well on one mental ability test generally do well on others and, conversely, those who do poorly on one test tend to do poorly on others. Spearman’s factor analysis enabled him to posit that there are two major factors underlying intelligence. The first and more important factor is the “general factor,” or g. The second factor is that which is specifically related to each particular test. Spearman was not sure what g was, but he did posit that it was “mental energy.”

L. L. Thurstone disagreed with both Spearman’s theory and with his isolation of a single factor of general intelligence. Thurstone argued that Spearman’s misapplication of his factor method led him to find just one factor, the g factor. He argued that there are seven primary mental abilities underlying intelligence: verbal comprehension, verbal fluency, number, spatial visualization, inductive reasoning, memory, and perceptual speed.

Psychologists such as Philip E. Vernon and Raymond B. Cattell argued that in some senses both Thurstone and Spearman were correct. Their reasoning is that abilities are arranged in a hierarchy. General ability, or g, is at the summit. The other abilities relate to ever more specific tasks as a person descends the hierarchy. Cattell went on to suggest that there are two major categories of abilities, fluid and crystallized. Fluid abilities, reasoning and problem solving, are measured by tests such as the analogies, classifications, and series completions. Crystallized abilities, derived from fluid abilities, include vocabulary, general information, and knowledge about specific fields. Most psychologists agreed that a broader subdivision of abilities was needed than was provided by Spearman, but not all of them agreed that the subdivision should be hierarchical. Other psychologists disagreed with the hierarchical ordering of abilities. The structure-of-intellect theory devised by J. P. Guilford , for example, postulated 120 abilities. He later increased the number to 150.

In general, it was becoming obvious to many that there were problems with psychometric theory. The number of factors had gone from 1 to more than 150. There was no satisfactory explanation given for any of these factors that explained overall intelligence.

Twin Studies

Twin studies use two methods to measure the effect of nature and nurture on overall intelligence. The first method examines identical twins reared apart, and the second looks at the differences between identical twins reared together and fraternal twins reared together. Identical, or monozygotic, twins are not totally identical, because they have had different experiences and are unique social and cultural products. Fraternal twins are formed from two different fertilized eggs, just as normal siblings are. Unrelated children reared together are also studied.

Although most identical twins studied show a 50 to 80 percent genetic contribution to intelligence, a closer examination reveals identical pairs with up to a twenty-point difference in IQ scores. This occurs when the environment is drastically different. The closeness of most identical twins is a result of nature and nurture; that is, the twins being raised in similar settings.

It has been reasonably obvious that many of the skills measured by IQ tests can be taught just as any other skills can be taught. If these skills can be taught, then at least part of what is measured by ability tests, including IQ tests, is learned and not inherent.

Specific Ability Tests

Among the more common ability tests are the School and College Ability Test (SCAT) and the Sequential Test of Educational Progress (STEP). The SCAT measures specific abilities in verbal and quantitative areas. It is used to make general, overall decisions about level and pace of instruction. The SCAT focuses on aptitude, not specific educational goals. The STEP battery measures actual achievement in reading, written language, and mathematics. STEP measures actual mastery and is, therefore, useful in indicating skills a student is ready to master.

Both SCAT and STEP testing can be used for in-grade-level or above-grade-level testing. In-grade-level testing provides information compared with others in the same grade, while above-grade-level testing indicates probable success or failure compared with those in higher grades.

SCAT assesses both verbal and mathematical reasoning abilities, using verbal analogies and quantitative comparison items. STEP mathematics computation measures a broad variety of computational skills, including operations (with whole numbers, fractions, and percentages) to evaluation of formulas and manipulations with exponents. STEP mathematics basic concepts measures knowledge of various concepts, including those involving numbers and operations; measurement and geometry; relations, functions, and graphs; and proofs. It also includes knowledge of probability and statistics, mathematical sentences, sets and mathematical systems, and application. STEP reading measures the capacity to read and appreciate a multiplicity of written materials. STEP English expression measures the aptitude to assess the accuracy and efficiency of sentences by requiring the student to perceive mistakes in grammar and usage or to decide among rewordings of sentences.

The SAT Reasoning Test is a widely used aptitude test that attempts to measure both intelligence and ability to undertake college studies. There are verbal and mathematical components to the test. The mean score on each test is 500, and each has a standard deviation of 100. The test was standardized on a group of ten thousand students in 1941. However, when scores dropped in the 1990s, with a verbal mean of 422 and a mathematical mean of 474, there was a readjustment of means. Educators attributed these lower scores of the student population to television and to deterioration in home and school situations.


IQ and other ability tests have been widely criticized, especially since the 1960s. These controversies have centered on the Eurocentric nature of the tests; namely, they have been designed primarily for use with white, middle-class children. The tests, therefore, have drawn fire from critics for being culture-bound. Some have seen them as unfair to African Americans, Latinos, and members of other minority groups. However, attempts to create culturally neutral tests have failed, and the tests have withstood court challenges. In Parents in Action on Special Education (PASE) v. Hannon (1980), a US District Court case involving Chicago schools, it was settled that the tests were not culturally biased and could be used to place children in special education courses.

These concerns over cultural bias, however, have raised another, related issue. That issue goes to the heart of IQ testing and concerns exactly what the tests measure. Critics argue that the tests do not measure mental abilities. The tests, they say, do not show how children arrive at their answers, only whether they are right or wrong. Knowing how a child arrives at an answer would better allow evaluators to gauge intelligence, for those who arrive at a right answer by guessing are not necessarily more intelligent than those who get the wrong answer but whose reasoning is sound. Additionally, people from different cultural backgrounds have different but equally valid ways of approaching problems. Westernized tests do not take these skills into account.

Moreover, there is still a debate concerning the relative impact of nature and nurture on intelligence. Those who hold for the predominant role of heredity have used comparative test results to argue for the dominant role of genetic differences among the various ethnic groups. In the early 1970s, the published research of Nobel Prize–winning physicist William Shockley of Stanford University and educational psychologist Arthur R. Jensen of the University of California concluded that heredity accounts for most differences in intelligence among different racial groups. This conclusion caused a great controversy, matched by the publication of The Bell Curve (1994) by Richard Herrnstein and Charles Murray, which came to much the same conclusions: Intelligence is primarily inherited, and there are different levels of intelligence among races.

Another controversy regards the tendency of most tests to take a holistic approach to intelligence. The Stanford-Binet test, for example, sees intelligence as a unified trait. In the minds of many critics, IQ tests are designed to measure a particular type of ability defined by the predominant class. Tests are culturally biased, so scores do not reflect an objective universal pattern of intelligence. Intelligence, they argue, is a social construct. Guilford devised a 180-factor model of intelligence, which classified each intellectual task according to three dimensions: content, mental operation, and product. This theory is the predecessor to Howard Gardner’s theory of multiple intelligence, developed since 1985.

Because of the influence of those social scientists who have argued for the influence of cultural differences, the tests are not the only basis for evaluating intellectual performance. There is a much greater awareness on the part of most psychologists of motivational and cultural factors in the role of development.

Response to Criticism

Intelligence tests seek to measure intellectual potential by using novel items, forcing test takers to think on the spot. The point is to avoid tapping factual knowledge. It is understood by psychologists that people come from different backgrounds, so it is difficult if not impossible to find items that are totally novel. Therefore, test makers require test takers to use relatively common knowledge. It is impossible to control for all of a test taker’s prior knowledge. Therefore, intelligence scores represent a blend of potential and knowledge.

IQ tests have reliability correlations in the range of 0.90 and above, which is higher than most other psychological tests. This fact does not mean that variations in motivation or anxiety do not lead to misleading scores. IQ tests are also valid when used to predict success in academic work. They are, therefore, great predictors of school success, but they are not good for predicting other types of success. People have acquired the belief that these tests measure a general sense of mental ability, when they actually focus on abstract reasoning and verbal fluency, the type of skills needed for academic success. They do not measure either social or practical intelligence. IQ tests do not stabilize until adulthood, and even then, they can change. There is a high correlation between high IQ scores and being in a prestigious occupation. Specific success in any given occupation, however, cannot be predicted in a meaningful way.

IQ tests not only are stable, reliable, and valid but also predict academic success and occupational status. They are one good measure of giftedness and can be used with measures of creativity to aid recognition of this type of intelligence. They can also be used to identify which children should be placed in remedial classes.


It is essential to note that no psychological test should be used in isolation, whether that test is diagnostic of psychological and behavioral problems or of ability. Each test result needs to be compared with and used in conjunction with results from other tests. Trained psychologists need to evaluate the test results in context, whether these are diagnostic tests, intelligence tests, tests for evaluating emotional depression, or personality tests.

Much progress has been made since the era of the dominance of psychometric theories. Then, the study of intelligence was dominated by investigations of individual differences in people’s test scores. Lee Cronbach, a major figure in testing, bewailed the segregation of those who study individual differences and those who seek regularities in human behavior. He made his plea for a union of these studies in an address to the American Psychological Association in 1957. His call helped lead to the development of cognitive theories of intelligence.

Use of cognitive theories has aided in interpreting the results of ability tests, for they give an understanding of the processes underlying intelligence. These processes allow an evaluator to understand why someone may do poorly on various tests. It may not simply be a matter of poor reasoning, for example, that leads to poor performance on an analogies test. It may be that the student does not understand the words in the analogies. The different interpretations may lead to different recommendations. Someone who is good at reasoning but does not understand basic vocabulary requires an intervention that is different from that needed for someone who is a poor reasoner.

For cognitive psychologists, intelligence is a combination of a set of mental representations and a set of processes that can operate on them. Thus, ability tests based on these principles have sought to measure the speed of various types of thinking. There is, moreover, an assumption that processes are executed in a serial fashion. There are a number of cognitive theories of intelligence, but all of them assume a mental process working on a mental representation.

A number of cognitive theories of intelligence have evolved. Among them is that of Earl B. Hunt, Nancy Frost, and Clifford E. Lunneborg. In 1973, they demonstrated that psychometrics and cognitive modeling could be combined. They started with tests that experimental psychologists used to study perception, learning, and memory. Individual differences in these tests were related to patterns of individual differences in IQ scores. They concluded that the basic cognitive process could be the basic components of intelligence.

Other developments led psychologists to begin with the psychometric tests themselves and to investigate the cognitive components of the skills tested on the tests. When these basic components were isolated, they could be evaluated and tested in isolation to compute their relationship with intelligence. This was done for information processing and computer modeling. Computer modeling, such as that of Allen Newell and Herbert Simon, uses a means-ends analysis to determine how close a problem is to a solution. Newell and Simon proposed a general theory of problem solving.

There are a number of psychologists who hold that information processing is parallel rather than serial. They argue that the brain processes information simultaneously, not in a serial fashion. It has proved difficult to construct ability tests to test this hypothesis. Moreover, the fact that intelligence differs from one culture to another, as Michael E. Cole has argued, has been ignored in psychometric tests. Additionally, psychometric tests are not good indicators of job performance.


Binet, Alfred, and Théodore Simon. The Development of Intelligence in Children. 1916. Reprint. Salem, NH: Ayer, 1983. Print.

Fish, Jefferson M., ed. Race and Intelligence: Separating Science from Myth. Mahwah, NJ: Erlbaum, 2002. Print.

Green, Anthony. Exploring Language Assessment and Testing: Language in Action. New York: Routledge, 2014. Print.

Gregory, Robert J. Psychological Testing: History, Principles, and Applications. London: Pearson, 2014. Print.

Herrnstein, Richard, and Charles Murray. The Bell Curve. New York: Simon, 1996. Print.

Lynn, Richard. The Global Bell Curve: Race, IQ, and Inequality Worldwide. Augusta, GA: Washington Summit, 2008. Print.

Minton, Henry L. Lewis M. Terman: Pioneer in Psychological Testing. New York: NYUP, 1988. Print.

Murdoch, Stephen. IQ: A Smart History of a Failed Idea. Hoboken, NJ: Wiley, 2007. Print.

Naglieri, Jack A., and Sam Goldstein, eds. Practitioner’s Guide to Assessing Intelligence and Achievement. Hoboken, NJ: Wiley, 2009. Print.

Plomin, Robert, et al. Behavioral Genetics in the Postgenomic Era. Washington, DC: APA, 2003. Print.

Shaffer, David R., and Katherine Kipp. Developmental Psychology: Childhood and Adolescence. Belmont, CA: Cengage, 2014. Print.

Urbina, Susana. Essentials of Psychological Testing. Hoboken, NJ: Wiley, 2014. Digital file.

Access hundreds of thousands of answers with a free trial.

Start Free Trial
Ask a Question