Testing & Evaluation Research Paper Starter

Testing & Evaluation

(Research Starters)

A test is composed of questions aimed at gathering information about a topic of interest. Evaluation, also referred to as assessment, is a process that usually consists of one or more tests along with other methods of obtaining data such as interviews or observations. Over the past few decades, testing and evaluation have become ubiquitous in American schools, with tests and evaluations informing high-stakes decisions like special education placements and federal funding allocations. Critics point out that tests can contain hidden biases that put some racial, ethnic, and economic groups at a disadvantage; and that high-stakes tests do not sufficiently examine individual students' abilities and achievements.

Keywords Achievement Gap; Evaluation; Formative Evaluation; High-Stakes Testing; No Child Left Behind Act of 2001 (NCLB); Summative Assessment; Test Bias; Testing


Slightly over two decades ago, Linn (1986) stated that testing was "ubiquitous" in the field of education. In the years since, the prevalence and relevance of testing and evaluation in education has certainly not declined and has arguably increased. From testing mandated by the No Child Left Behind (NCLB) Act (U.S. Department of Education, 2002) to the integral role of evaluation in special education, testing and evaluation are indeed key aspects of education.

Fremer and Wall (2003) provide useful definitions of testing and evaluation. In general, a test is composed of questions aimed at gathering information about a topic of interest. Evaluation, also referred to as assessment, is a process that usually consists of one or more tests along with other methods of obtaining data such as interviews or observations. School personnel, including teachers, counselors, administrators, and psychologists, frequently use both testing and evaluation to determine students' progress and achievements.

Educational Testing

Currently, educational testing "typically refers to group-administered standardized tests of subject area knowledge or academic skills, abilities, or aptitudes" (Scheuneman & Oakland, 1998, p. 77). Educational testing can be traced back to Horace Mann's common exam in 1845, through the development of individual intelligence tests for schools and the military in the early 1900s, and up to the high-stakes testing that characterizes the No Child Left Behind Act today (Gallagher, 2003; Geisinger, 2000).

Over time, the educational testing process has evolved to the point where, after taking a test, an individual can interpret his or her score by comparing it with those of others (Fremer & Wall, 2003). For instance, when a classroom teacher decides to give a test to students, he or she will first create the test from material covered in class, then administer and grade the test (perhaps on a scale of 1 to 100), and finally inform students of the grade they earned. However, the 75% score a student receives on this test might not be comparable to the 75% score a student in the same grade but in another classroom (in the same school or in another state) receives on his or her test. Though the tests might have covered the same material, they may have asked different questions or been graded differently. Standardized tests can correct these differences.

Standardized tests are tests that are administered, scored, and interpreted in a systematic manner. Incorporating a normative reference group, or an approximation of the existing population of interest, in the development of standardized tests allows for comparisons between students across classrooms, schools, and regions of the country. A criterion-referenced, standardized test, on the other hand, compares individual test scores against a predetermined standard. Scores are interpreted on the basis of whether or not they meet this standard, rather than by comparison with the scores of other test takers (Russell, Goldberg, & O'Connor, 2003).

Other important constructs related to test construction and interpretation are reliability and validity. Reliability is achieved when test measurement is consistent over time and administration; validity describes the degree to which a test accurately measures the concept it is meant to measure (Paris, 1998).

According to Scheuneman and Oakland (1998), the testing process can be characterized along the following dimensions:

• Why the test is administered (for example, to measure intelligence or vocational aspirations),

• Who administers the test (such as a psychologist or teacher),

• How the test is administered (for instance, individual or group format or on a computer),

• Who the test is administered to (a group or individuals), and

• Where the test is administered (for example, a school or testing facility).

After a test is selected and administered, the results can be used in a number of ways.

Educational Evaluation

Evaluation, also referred to as assessment, is distinguished from testing in that evaluation usually entails more than one test as well as other methods of information-gathering such as interviews or observations. Educational evaluation can be used to determine accountability, assist with development, and garner knowledge (Macdonald, 2006). Additionally, evaluation can be described as either formative or summative. Moon (2005) describes formative evaluation as a continual process of student assessment that serves to guide instruction. She denotes summative evaluation as a means to assess instruction once some facet of it is completed such as at the end of a lesson, semester, or school year. Standardized tests are an aspect of summative evaluation; methods such as portfolio-based assessment, or a collection of representative student work based on the classroom curriculum, and teacher observations are considered components of formative evaluation (Paris, 1998).


Applications of Educational Testing

The Code of Fair Testing Practices in Education (Code, 2004) lists admissions, assessment, diagnosis, and placement as areas in which the results of educational testing can be applied. Fremer and Wall (2003) add accountability evaluations, judging progress and following trends, and self-discovery as other uses for educational testing results.

• Private and selective public schools as well as colleges and universities may require applicants to take tests like the SSAT, ACT, or SAT. Applicants' scores are then used to determine their eligibility for admission.

• Educational tests also play an integral role in educational evaluation or assessment since the results of such evaluations are used to place students in special education programs.

• Diagnostic tests provide insight into student achievement and adjustment. Counselors and teachers use information gleaned from diagnostic tests to make decisions about student support services related to emotional or academic needs.

• Accountability evaluation uses test results to determine if a student has met academic expectations.

• Student progress can be tracked if a test is repeatedly and regularly administered within the same testing environment. This method can also identify any trends in a student's abilities.

• How the test is presented to the individual,

• How an individual might respond to the test,

• The length of time given for the test, and

• The test setting (Pullin, 2002).

Vacc and Tippins (2002) describe accommodations as either physical (or basic) or special. A physical accommodation might be changing the test setting, while allowing a student more time to take a test would be a special accommodation. Vacc and Tippins also denote test exemptions

as a type of accommodation. Test exemptions are granted in instances in which an appropriate accommodation necessary to maintain...

(The entire section is 3517 words.)