Analysis of Secondary Data
Due to various ethical and logistical considerations, it can be impossible in some settings to gather primary data for research analysis. However, many sources of secondary data are available that can be further analyzed by researchers seeking to answer other research questions. Secondary analysis may be qualitative or quantitative in nature and may be used by itself or combined with other research data to reach conclusions. Although the use of secondary data can be more cost-effective than the use of primary data, the fact that the researcher has no control over how the data were collected means that there are several disadvantages as well. However, a well-designed meta analysis or other study that incorporates secondary data can be very useful to the researcher in answering questions about social issues and significantly aid in the advancement of the social sciences.
Keywords Experiment; Inferential Statistics; Inter-Interviewer Reliability; Interviewer Bias; Interviewer Effects; Meta Analysis; Qualitative Research; Quantitative Research; Secondary Analysis; Subject; Survey; Variable
Acquiring data for research in the social and behavioral sciences can be a difficult process necessitating the application of great creativity. Sometimes, ethical considerations mean that it is impossible to experimentally manipulate variables. For example, when studying the detrimental effects of length of unemployment, one cannot in good conscience randomly decide which subjects will lose their jobs and which ones will not or for how long they will be without income. In other cases, the mere fact that a researcher is observing the subjects changes the way that the subjects act. For example, the Hawthorne Effect refers to a well-known study of the effects of lighting levels on assembly line employees at the Hawthorne works of Western Electric outside Chicago. Researchers found that productivity increased not only when lighting levels were increased, but also when they were decreased because of the subjects' expectations that the experimental interventions would enable them to increase productivity. In still other cases, it is simply not possible to gather the data needed for a research study for practical or logistical reasons. For example, to test the effectiveness of a new training program for aircraft maintenance personnel, one could easily design a controlled study to see whether personnel performed better after training or without training. It would be relatively simple to operationally define dependent variables for the study including number of fatal crashes. However, it is highly unlikely that any airline would be willing to risk the lives of their employees or customers to collect such data.
Fortunately, researchers are not restricted to the use of primary data (i.e., data that are collected specifically for the research study). Many types of secondary data that have been collected and analyzed for other purposes are often available for re-analysis. In secondary analysis, further analysis of existing data (typically collected by a different researcher) is conducted. The intent of secondary analysis is to use existing data in order to develop conclusions or knowledge in addition to or different from those resulting from the original analysis of the data. Secondary analysis may be qualitative or quantitative in nature and may be used by itself or combined with other research data to reach conclusions.
Sources of Secondary Data
Secondary data are available from many sources. In some cases, one must contact the researchers of previous studies and gain access to their data. In other cases, it may be possible to use public access data.
• veteran's issues, and women's issues. The Census Bureau can be accessed at www.census.gov.
• University of Minnesota's Minnesota population Center is an integrated series of census microdata samples for US and international population studies. The data are intended for use by economists and social scientists. The data date back to the 1960s and includes 80 samples from 26 countries, with more scheduled for release in the future. The IPUMS data can be accessed at www.ipumns.umn.edu. The Bureau of Labor Statistics collects and maintains data on employment, earnings, living conditions, productivity, and other factors of interest to social scientists. The portal for the Bureaus of Labor Statistics data is found at http://stats.bls.gov.
• The Inter-University Consortium for Political and Social Research (ICPSR) maintains the world's largest archive of digital social science data. The goals of the consortium are to acquire and preserve social science data, provide open and equitable access to these data, and promote their effective use. The ICPSR web site is found at www.icpsr.umich.edu.
In addition to these sources, secondary data can be obtained for analysis through a wide variety of sources including newspaper and periodicals, organizational records and archives, videotapes of motion pictures and television programs, web pages, scientific records (e.g., patent applications), speeches of public figures, votes cast in elections or by legislators, as well as personal journals, diaries, e-mail, and correspondence. Many other sources of secondary data are available depending on the needs of the researcher.
Advantages to Using Secondary Data
There are a number of advantages to using secondary data for analysis. As discussed above, there are certain situations in which it is impossible for ethical, logistical, or other practical reasons to collect primary data. The analysis of secondary data allows researchers to examine data collected for other purposes to find the answers they seek to research questions. For example, the study of the effects of unemployment could include the reanalysis of questionnaires routinely collected by government or private employment agencies. The re-analysis of previously collected survey data could also be used in some cases to answer other questions about the effects of various levels of the independent variable on the dependent variable without the presence of the researcher or other observer changing the results. Similarly, a historical study of routinely collected data might be divided into groups for aircraft that had been worked on by technicians who had received the new training vs . those who had not. In addition, the collection of data for secondary analysis is typically much faster because the data have already been collected. Similarly, the researcher does not have to develop a new data collection instrument or run a new experiment, other factors that both reduce the time to gather the data as well as the costs associated with data collection. A major advantage of the analysis of secondary data is that the collection of such data is non-reactive. In other words, particularly for archival data, subjects will act naturally because they do not realize that their behavior is being observed and recorded. This advantage, of course, does not extend to data collected with surveys or direct observation where subjects know that their reactions are being observed.
Disadvantages of Using Secondary Data
On the other hand, the analysis of secondary data is not without its potential disadvantages as well. Unless one has collected the data oneself, it is virtually impossible to be completely confident in the quality of the data. Although the survey instruments associated with data sets may be available, one does not necessarily know what the inter-interviewer reliability is for surveys not under one's own control or whether or not interviewer bias or other interviewer effects may have tainted the data. Further, it is not always possible to find available data sets that contain the data that one needs to analyze. Another disadvantage in the use of secondary data arises from questions concerning the way subjects were selected. In most research studies, subjects are chosen from a representative sample so that results can be extrapolated to the general population. However, just as it is not always possible to know if interviewer affects were unintentionally introduced into data collection, it is similarly impossible in many cases to know whether or not a sample selected by someone else is truly random or if it was biased. When one uses primary data and research analysis, one can be confident about the way data were collected, samples were selected, and the relevance of survey items and other measurements to the research hypothesis. However, the same cannot always be said for analyses performed in secondary data. In samples where sampling error or bias occur, any conclusions drawn from the data cannot be extrapolated to the population at large.
There are a number of issues that must be considered before embarking on a secondary analysis. First, if using secondary data collected using a survey instrument, it must be determined whether or not the wording of the question(s) of interest on the survey are a good fit for the data being used in the current analysis. If the wording is ambiguous or otherwise questionable for use in the current study, a better source of data needs to be found. When the results of a secondary analysis are reported, it is important to also consider the experimental...
(The entire section is 4087 words.)