To yield useable data, surveys, assessment tools, and other data collection instruments need to be both reliable and valid. Reliability is a measure of the degree to which such instruments consistently measure a characteristic or attribute. Statistically, reliability is a measure of the observed variability in obtained scores on an instrument. Variability can come both from true variance (such as differences in opinions, knowledge, or other characteristics of the individual) or from error variance. The total variability of a data collection or assessment instrument is the sum of the true variability and the variability due to error. Reliability can be estimated through the use of parallel forms of the instrument, repeated administration of the same form of the instrument, subdivision of the instrument into two parallel groups of items, and analysis of the covariance among the individual items.
In the case of data collection or assessment instruments, reliability -- the degree to which a data collection or assessment instrument consistently measures a characteristic or attribute -- is essential for the instrument to be valid. In other words, one must be confident that the instrument measures what it purports to measure. No matter how well-written a data collection or assessment instrument appears to be on its face, it cannot be valid unless it is reliable. If a measure is not reliable, it does not consistently measure the same thing. In other words, it is not measuring the construct that it was designed to measure, so the instrument is neither reliable nor valid. Therefore, both validity and reliability are essential when conducting survey research, so that the data collected in the study will actually give researchers the information they are trying to gather. Without both reliability and validity, the data collected are meaningless, and no conclusions can be drawn.
True Data Variance vs. Data Error
Even in the physical sciences, two sets of measures performed on the same individuals never exactly duplicate each other. To the extent that this is true, the measurement instrument is unreliable, whether it is a physical scale used to measure the weight of a chemical compound or a paper-and-pencil survey used to measure a person's attitude toward something. For example, on a scale of 1 to 10, what one person describes as a 10 another person may call a 9.5. This does not necessarily mean that their opinions are different, just that the two people are expressing them differently. Some of the total observed variance (the square of the standard deviation) in scores is due to true variance, or real differences in the way that people are responding to the question. The other part of the total variance is due to error.
Factors Affecting Reliability
There are many reasons why a data collection instrument may not be reliable and thus may contribute to the error variance. In general, what social scientists try to measure are lasting and general characteristics of individuals related to the underlying construct that the assessment instrument is trying to measure. However, other types of characteristics that are not part of the underlying construct, such as the individual's test-taking techniques and general ability to comprehend instructions, may also be measured.
In addition to the permanent characteristics of individuals, there are also temporary characteristics that can affect their responses to questions on data collection instruments. These might include such factors as general health, fatigue, or emotional strain, all of which can affect the way that an individual responds to a question -- a phenomenon familiar to anyone who has had to take a test in school when he or she was ill. Similarly, external conditions such as heat, light, ventilation, or even momentary distraction can impact one's responses in a way that does not reflect the underlying theoretical construct. Further, the subject's motivation can also impact the reliability of a data collection instrument. For example, for the most part teachers assume that their students are motivated to do well on any data collection instruments (e.g., a mid-term exam) given them. However, the same assumption cannot be made when asking a random sample of individuals to answer questions on the data collection or assessment instrument. For instance, it is often difficult to get shoppers to cooperate in opinion surveys because they are intent on accomplishing their errands so that they can go home. The motivation that may be offered to entice participation in the survey, such as a crisp new dollar bill or a carton of instant macaroni and cheese, is nothing compared to the motivation of students to do well in a course.
Difficulty in Understanding the Data Collection Instrument
Another source of variability in the way people respond to a data collection instrument may be individual differences in the way that people interpret the questions on the instrument. Care must always be taken in the development of a data collection instrument to write to a level that can be understood by all the people who will answer the questions. The questions need to be written unambiguously and with proper spelling, grammar, and punctuation to help reduce the possibility of low reliability because people do not understand what the questions are asking. For example, a child could easily take a question about a person who "lives near" him or her to mean the family members in the immediate household rather than a neighbor. Similarly, a question about how much one likes "sweet tea" means a different thing in the southern United States, where it refers to iced tea sweetened with simple syrup, than it does in Great Britain. The use of clear, concise language and operational definitions can help increase the reliability of the instrument.
Reliability problems may also stem from individual differences in the way that people interpret responses to a data collection instrument. Even in cases where the end points of the scale are operationally defined with clear examples, people who moderately dislike something could possibly vary their answers between 20 and 40 on a scale of 100, yet all mean the same thing. Similarly, some people never give a perfect score to anything on a rating scale because they believe that there is always room for improvement.
Another potential cause of lack of reliability is when the data collection instrument is not valid and is actually measuring more than one thing. For example, a researcher might set up an experiment to determine whether men or women are more likely to stop and assist a stranger on the street who needs help. This could be done by having a confederate drop a sheaf of loose papers and counting how many times a man stops to help and how many times...
(The entire section is 3006 words.)