Classroom Evaluation Research Paper Starter

Classroom Evaluation

(Research Starters)

Classroom evaluation is a crucial part of the learning process as it is used to measure and improve student learning as well as the quality of classroom instruction. Classroom evaluation encompasses the procedures used by teachers to determine if, and to what degree, their students can demonstrate a skill, behavior, or body of knowledge. Over the past century, the number of assessment methods have grow exponentially; teachers today can avail themselves of a variety of traditional and alternative assessment methods to evaluate and improve the quantity and quality of students' skills and knowledge.



Testing & Evaluation > Classroom Evaluation


The Joint Committee on Standards for Educational Evaluation defines evaluation as "the systematic investigation of the worth or merit of a student's performance in relation to a set of learner expectations or standards of performance" (2003, p. 228). Teacher evaluations measure student achievement by scoring students on a set of explicit expectations and then placing their scores on a scale that differentiates the degrees to which students demonstrate a skill, body of knowledge, or behavior. Evaluations can measure cognitive outcomes (of demonstrated knowledge) as well as affective outcomes (higher-order connections) (McMillan, 2001; Marzano, 2001; Popham, 2001). The feedback evaluations provide to instructors plays a prominent role in every facet and at every level of curricular development (Beane, Toepfer, & Alessi, 1986).

Assessment methods are the techniques and strategies teachers use to collect significant information about students' skills, body of knowledge, or behavior in order to perform their evaluations. Like evaluation, the process of assessment is central to instruction. Twenty-first century curricula demand that teachers be adept at using a variety of classroom assessment methods (American Association for the Advancement of Science, 2000; Joint Committee on Standards for Educational Evaluation, 2003; Littke & Grabelle, 2004; Marzano, 2001; Wiggins, 1998).


Curriculum evaluation moved through a number of stages during the late nineteenth century and throughout the twentieth century. The scientific experimentation and measurement movement with its emphasis on grading, marking, and judging dominated the late 1800s and early 1900s. The measurement movement grew steadily throughout the 1920s, 1930s, and 1940s. Evaluation expanded beyond measurement with the so-called "Eight-Year Study" (1933-1941). However, the grading-marking-judging approach to classroom evaluation has guided educators throughout the history of education and is still widespread in the early twenty-first century (Schubert, 1986).

A shift came during the 1940s when the U.S. Office of Strategic Services began using a multiform, holistic system of assessment to identify the military personnel best suited for espionage behind enemy lines. The system was described in the agency's report Assessment of Men: Selection of Personnel for the Office of Strategic Services. These assessments used an assortment of procedures and tests to evaluate not only individuals' capabilities and skills, but also personal characteristics and attitudes. And by assessing multiple aspects, the multiform assessments derived a fuller picture of individuals' physical, mental, and emotional abilities and, thereby, their allover suitability for espionage (Office of Strategic Services, 1948; Wiggins, 1993).

The 1950s brought Benjamin Bloom's Taxonomy of Educational Objectives, one of the most widely used tools for teachers' classroom evaluation of student learning. This book provided teachers with multiple sets of behavioral verbs, helping them write performance descriptions, expectations, or desired performance outcomes, and higher-level reasoning tasks for students (Bloom, 1956).


Emerging in the late 1960s through the early 1970s and increasing in intensity, accountability movement pressures began to dominate evaluation and assessment. After A Nation at Risk was published, the 1980s brought more rigorous requirements so as to improve students' test scores and performances at college (National Commission on Excellence in Education, 1983; Schubert, 1986). The report sounded the call for accountability in education and exposed the failures of the U.S. educational system.

Evaluation for accountability purposes became the watchwords of the 1980s and 1990s. The accountability movement has extended up to the present day, culminating with standards-based assessments and large-scale, high-stakes, standardized state testing programs. Regardless of the dominance of accountability and standardized testing, the twenty-first century has also brought an expansion in the types of evaluation data that are considered valid for assessments. The influence and use of technology in authentic, contextual learning environments has led to dissatisfaction with existing classroom evaluation methodologies, spurring the development of innovative new methodologies (Kovalik & Dalton, 1997; Marzano, 2001).


Assessments used in classroom evaluation need to focus not only on methods and techniques, but also on purposes. The type of classroom assessment used needs to be matched with its purpose(s). Classroom evaluations are certainly used to document what students have learned and to produce information about student progress, but classroom evaluations are also needed to provide teachers with information for making instructional decisions (Popham, 2001). The student-performance data emanating from classroom evaluations provide teachers with feedback on the effectiveness of their teaching procedures, helping them improve instruction and student learning. The daily classroom cycle of performance and feedback produces most student learning and most of the improvement in schools (Lewis & Doorlag, 1987; Wiggins, 1998).

A schematic developed and illustrated in Popham's 2001 book, The Truth About Testing: An Educator's Call to Action, modified and reproduced here as Figure 1, shows the ideal relationships among assessment, instruction, content, inferences, and decisions.

Teachers use assessments to sample students' knowledge and skills. Sampling the larger body of content, teachers are able to use assessments to evaluate the degree to which students have mastered the body of content as a whole. They then rely on these evaluations to make decisions on how best to teach students (Popham, 2001).

Instruction can move forward when classroom evaluations, based on the use of assessments, show that students have learned what they should (Gage & Berliner, 1988). For example, if a classroom teacher requires students to write two persuasive essays on a final exam and they perform well on both essays, then the teacher can reasonably infer that students are capable writers of persuasive essays (Popham, 2001). Remedial instruction is required if classroom evaluation reveals that some or all of the students have not acquired the skills that were taught. In some cases, evaluation indicates that the entire instructional sequence must be started anew. This requires rethinking educational objectives, student characteristics, teaching strategies, and the learning process (Gage & Berliner, 1988).


Traditional vs. Alternative Assessment

Educational evaluation includes a broad array of assessment approaches including both traditional and nontraditional techniques. Traditional assessments measure student learning with paper-and-pencil and electronic tests, while alternative assessments seek to measure learning along with students' ability to reason and think critically. Both types of assessments can use objective, selected response items and subjective, constructed response items. Selected response items limit the range of student responses through, for example, traditional multiple choice problems, or alternative self-assessment. Constructed response items allow students to create their own responses to assessment prompts and can include traditional essays and alternative performance assessments (see Table 1).

Table 1: Classification of Different Methods of Assessment

Type of Response Type of Assessment Traditonal Alternative Selected-response: Objective-type assessments in which responses or answers are chosen from those provided. Multiple-choice True-false Matching Binary-choice Structured observation Structured interview Surveys Student self-assessment Constructed response: Subjective-type assessments in which responses or answers are supplied or constructed. Sentence completion Short answer Essay Anecdotal observation Unstructured interview Papers Reports Performance assessment Authentic assessment Portfolio assessment Exhibitions Demonstrations Student self-assessment (Modified from McMillan, 2001)

A variety of traditional and nontraditional assessment methods, techniques, and formats are used in K-12 classrooms to garner evaluative information and sources of evidence. Some are used more in certain subject areas than in others, but they can, for the most part, be used multidisciplinarily. These approaches include homework assignments; teacher-developed, paper-and-pencil quizzes and tests; teacher observations; journals; lab notebooks; and essays and writing samples (American Association for the Advancement of Science, 2000; Joint Committee on Standards for Educational Evaluation, 2003; Marzano, 2003; National Research Council, 2000; Wiggins, 1993). The specific assessment option(s) selected and implemented in a given circumstance should be the one(s) that will provide the best source(s) of evidence (American Association for the Advancement of Science, 2000; McMillan, 2001; Popham, 2001).

Quantitative vs. Qualitative Assessments

A second classification scheme subdivides assessment methods into two major types:

* quantitative assessments--those using measurement, and

* qualitative assessments--those using nonnumerical descriptions

Quantitative assessments evaluate student learning as either a raw score or a standard score. Raw scores are determined from the number or percentage of test items a student answers correctly. Standard scores are determined by comparing a student's raw score to those of other students and assigning a percentile rank (McMillan, 2001). Both standardized tests and the tests and quizzes developed by teachers are quantitative assessments.

Standardized assessments include norm-referenced tests, which compare the test taker to a sample of his or her peers to derive a standard score, and criterion-referenced tests, which use a raw score to measure the degree to which the test taker possesses a trait or skill (Popham, 1975, cited in Measurement and Evaluation). Criterion-referenced standardized assessments thoroughly measure specific skills and knowledge--such as geometry or American literature--while norm-referenced standardized assessments are designed to measure a broader variety of...

(The entire section is 4916 words.)