Because of the nature of behavioral research, sociologists frequently use surveys and various other types of written data collection instruments (or their electronic equivalents) to obtain information from the people in their studies. Good data collection instruments have two characteristics in common: they are both reliable, consistently measuring whatever variable they measure, and valid, actually measuring what they purport to measure. There are several types of validity and concomitant approaches to determining the validity of an assessment instrument. Feedback from validation studies can be used to improve the quality of an assessment instrument and the data that behavioral scientists use to test their theories and describe the real world.
Keywords Correlation; Criterion; Data; Empirical; Operational Definition; Reliability; Sample; Survey; Survey Research; Validity
Research Methods: Validity
Data about human beings can be obtained from any number of sources, including observation of individuals or groups in either laboratory or real-world settings, historical data that have been collected for other purposes, and data collected by asking questions directly of individuals themselves regarding their opinions, attitudes, feelings, or past reactions. Data collected directly from individuals are typically gathered using various paper-and-pencil measurement instruments (or their electronic equivalents). Some of the more frequently used instruments of this type include surveys and questionnaires, personality tests, or even tests of mental ability. As anyone who has ever participated in a survey or taken a test knows, however, some data collection instruments are better than others. To be useful for scientific research, data collection instruments need to have two characteristics: They must be both reliable and valid.
Validity is the degree to which a survey or other data collection instrument measures what it purports or was designed to measure. For example, a survey that attempts to gather information about participants' attitudes toward candidates in a political election is valid if it indeed captures information about their attitudes toward the candidates rather than something else (e.g., their attitudes toward the person administering the survey). Reliability is the degree to which an assessment instrument consistently measures what it is intended to measure. No matter how well written a data collection instrument appears to be on its face, it cannot be valid unless it is reliable. In other words, if a measure is not reliable it does not consistently measure the same thing. This means that sometimes it is not measuring the construct that it was designed to measure, so the instrument is neither reliable nor valid. Both validity and reliability are essential when conducting survey research so that the data collected in the study will actually give researchers the information they are trying to gather.
Because of the nature of behavioral research, sociologists frequently use surveys and various other types of written data- collection instruments to obtain information from the people in their studies. As opposed to research in the physical sciences where one knows without question the difference in weight between one gram and two grams of a chemical compound and can judge the reaction this change makes, measuring people's attitudes, opinions, and other subjective factors is less straightforward. For example, if a researcher wanted to determine how angry a certain situation made people, he or she could develop a continuum of actions that would empirically test people's anger level. On a scale of one to ten, a score of ten might be operationally defined as throwing a temper tantrum while a score of one might be operationally defined as no observable difference in behavior. The problem with this approach, of course, is that not everyone shows the same behavioral responses to situations even though they may be feeling the same emotion. Mrs. Jones may express her extreme displeasure with a sniff and a disapproving gaze while Mr. Smith may express his extreme displeasure by slashing someone's tires. If asked how angry they were on a scale of one to ten, however, both persons might reply that they were a 9.5. Researchers would need a better measure to determine how angry someone was.
Types of Validity
There are several different types of validity. Some of these types of validity are more appropriate to a discussion of the development of academic tests and organizational assessment instruments where real world criteria of success exist. For example, if I want to develop a mid-term for one of my classes, I have relatively absolute criteria of success such as whether or not the students can recall various facts in the textbook. Similarly, if I want to develop a test that predicts how well an applicant will do on the job based on their experience and aptitudes, I have available various criteria of successful job performance for current employees who I could match on various predictors (e.g., previous experience, scores on aptitude tests) that I think are relevant. From a behavioral research perspective, however, validating a data collection instrument can be more complicated. For example, as mentioned above, there is no absolute real-world criterion for anger. All I can do is ask people how angry they feel and take their word for it. This is true for most measures of attitudes, opinions, and the other subjective types of data that are of interest in many behavioral research studies.
Even though there are no objective criteria available on which one can test the validity of an assessment instrument, it is still important to try to develop as valid an instrument as possible. There are several types of validity of interest for such instruments. Content validity is a measure of how well the instrument items reflect the concepts that the instrument developer is trying to assess. Content validation is often performed by experts in an appropriate field of study. For example, a psychologist could review an assessment instrument measuring "anger" and determine whether or not the questions appear to reflect the state-of-the-art knowledge about anger and its indicators, or an expert in early childhood education determines the validity of an assessment instrument designed to test a child's reading level. Criterion-related validity is a measure of how well an assessment instrument measures what it is intended to measure as defined by another assessment instrument. Criterion-related validity, for example, could be ascertained by correlating the scores of the assessment instrument being validated with another instrument that has been proven successful in assessing "anger." Construct validity is a measure of how well an assessment instrument measures an underlying theoretical concept ("construct") that the researcher has developed.
Other types of validity that may be of interest in sociological research include cross validity, predictive validity, and face validity. In cross validation, the validity of an assessment instrument is tested with a new sample to determine if the instrument is valid across situations. For example, the anger assessment instrument might be validated with high school students and then cross validated with working adults to see if it is valid in both situations or if it only has limited applicability. Predictive validity refers to how well an assessment instrument predicts future events. For example, a sociologist might develop a psychometric instrument to assess the presence of known risk factors for juvenile delinquency. The instrument could be administered to adolescents and correlated with their incidence of juvenile delinquency. If there was a high correlation between scores on the instrument and juvenile delinquency and the instrument also had high reliability, it could be used for predicting adolescents who were at-risk for becoming delinquent so that schools or social service agencies could intervene to counteract the risk factors. Rather than being a true measure of validity, face validity is merely the concept that an assessment instrument appears to measure what it is trying to measure. For example, an assessment instrument designed to collect data about anger would ask about the respondents' mental and behavioral reactions in various anger-provoking situations. This is not to say that...
(The entire section is 3680 words.)