in J. Schneider and J. Kitsuse, Studies in the Sociology of Social Problems, Ablex, 1984. I am appreciative of comments offered by Robert Perrucci, Diane Vaughan, and Ron Westrum
Back
to Home Page | Notes | Tables
Gary T. Marx
Massachusetts Institute
of Technology
What do ABSCAM, the Santa Barbara Oil Spill, and the Freedom of Information Act have in common? Or what do blackmailers, police, priests, journalists, and some social problems researchers have in common? (Perhaps we'd better not answer that.) In the first case, aside from what they may communicate about the pathos of the last decades, each represents a means of collecting hidden and dirty data. These means are experiments, accidents, whistle blowing, and coercive institutionalized discovery practices. In the second case, we have actors who routinely deal with discovering secret and dirty data.
Issues of discovering and protecting secrets confront everyone in daily life. But they are highlighted for certain occupations such As public or private investigators (including detectives, inspectors general, Congressional investigators, auditors and spies), journalists, social reformers and sometimes social researchers. They have particular saliency for the social problems researcher who may seek data which insiders wish to keep secret.
As a result of research on substantive topics such as agents provocateurs, informants, undercover work, frame-ups, cover-ups, and muckraking, I have become interested in what can be called the "hidden and dirty data problem."
What follows is an essay on dirty data research. It offers neither an explanation, nor fresh empirical data. Instead its purpose is to call attention to this data gathering problem, suggest some of the issues it raises and a framework for approaching them, and speculate on what may be involved. I self-consciously raise a number of questions which I do not begin to answer adequately. This is a necessary first step in the generation of the more systematic empirical inquiry and theoretical development that is needed. In what follows, I define dirty data; consider some factors contributing to an apparent increase in the ease of discovering it; contrast some basic discovery mechanisms; and consider some of the implications of dirty data for the study and understanding of society.
By hidden and dirty data, I mean just that. We can locate it more precisely by combining two variables into a typology (Table 1). The first variable involves information which is publicly available, unprotected, and open, at one end of the continuum, and information which is secret, private, closed, or protected at the other. If we bring this together with the second variable, which is a continuum with nondiscrediting information at one pole and highly discrediting information at the other, we have the typology of Table 1.
Hidden and dirty data lie (no pun intended) in Type D: It is information which is kept secret and whose revelation would be discrediting or costly in terms of various types of sanctioning.
The data can be dirty in different ways. But in all cases, it runs contrary to widely (if not necessarily universally) shared standards and images of what a person or group should be. Of course, all persons and organizations have minor discrediting elements and show a gap between ideal standards, public presentations, and private reality (e.g., Hughes, 1971; Goffman, 1963). 1 But by dirty data, I have something rather more formidable in mind than soft-core discrepancies. Dirty data at the organizational level ought to be of particular concern to the social problems researcher. Issues of hidden and dirty data are likely to be involved to the extent that the study of social problems confronts behavior that is illegal, the failure of an agency or individual to meet responsibilities, cover-ups, and the use of illegal or immoral means.
While Hughes' (1971) concepts of "dirty work" and "guilty knowledge" may at times overlap with dirty data, they are distinct. Neither of the former have to be hidden, and they may be central to the legitimate license and mandate of an occupation or organization. Worker designations of selected tasks as "dirty work" may be a means of sustaining a heroic or moral definition of their occupation (e.g., Emerson and Pollner, 1976). Society needs dirty workers. Certain occupations must engage the profane and are empowered to violate standards that apply to others. However, because of the protected opportunity structure they face, such workers often generate dirty data.
Discrediting or dirty data and secrecy tend to go together. They can, of course, be independent and even inversely linked. Secrecy is a basic social process contributing to group boundaries (Simmel, 1950; Tefft, 1980). A mandate to use it can easily lead to its unintended expansion (e.g., Lowry, 1972). In the form of privacy, it represents an important societal value (Shils, 1966; Warren and Laslett, 1980). Organizations protect dirty as well as clean data. As Type B implies, this protection need not serve nefarious ends. Furthermore, all that is dirty is certainly not kept secret, as in the very interesting cases in Type C, where efforts to protect discrediting information are not taken. 2 But such subtleties aside, a major issue for the social problems researcher is how to pierce the secrecy that so often surrounds the subject matter.
Commentaries on field research of course often consider the difficulties in obtaining valid and reliable data (e.g., Douglas, 1976; Van Maanen, 1979). They assume that individuals and organizations present a veil of secrecy masking what is really going on. The researcher must find a way to lift or slip through, over, or under the veil. Both self and group boundaries are partially maintained by controlling information given to others and outsiders.
Well known barriers to data collecting are concern for privacy, suspiciousness of, or reticence towards, outsiders asking questions, a lack of reciprocity in the researcher-researcher relationship, a desire to keep information from rivals or competitors, and a wish to put forward one's best face or group image. This is the case even when the data sought are not particularly discrediting. The problems are compounded, however, when we seek data that are in some way "dirty", as with some social problems, political sociology, and criminology research.
In these cases, data gathering is even more difficult. The vested interest in maintaining secrecy may be much stronger because illegal or immoral actions are involved and the costs of public disclosure very high. We may be dealing with people who are specialists at maintaining secrecy and deception. They may be part of organizations that routinely mislead or obscure. The issue can go beyond the withholding of information to offering what, in the intelligence trade, is called "misinformation" and "disinformation." Well kept secrets or deception may prevent the researcher from even knowing what to look for, questions of restricted access aside. How can the researcher hope to gather dirty data when the will and resources to block this are so strong?
Perhaps the most common response has simply been to stay away from such topics (the founding of the Society for the Study of Social Problems [SSSP] was a reaction against this). Like a river, researchers follow the path of least resistance. Or, perhaps better, like immigrants, we tend to go where, if we are not necessarily welcomed, we are at least tolerated. Often, of course, this is at the bidding (or at least with the resources) of the very elites who sit atop mountains of dirty data.
Yet, if it is valid to describe our attraction to more easily gathered "clean" data as a central tendency, such a characterization misses the considerable variation around the mean. While not the norm, nor a thriving industry, we do not lack for research of an investigative, muckraking, or scandal producing nature. The amount of such research has increased significantly in the last two decades, and it is possible that, human subjects limitations aside, such research will become more prevalent and prominent.
Factors Contributing To Increased Accessibility
Whether or not the relative amount of dirty data in our society has been increasing or decreasing is an interesting, and probably unanswerable, question. 3 However, there can be little doubt that accessibility to dirty data has increased in recent decades. 4 A not insignificant number of journalists, social reformers, and researchers have been able to gather information on highly discrediting phenomena which, according to a conspiracy perspective on the dirty data problem, we should be unable to study (one is reminded of the mathematical proof that airplanes cannot fly). The record might be read to suggest that there is an abundance of riches here. The streets may not by paved with gold, but they are often lined with partly visible muck. Instead of a paucity of information, we may suffer from a kind of dirty data overload. Rather than a lack of access, the problem may be in deciding, in a context of abundance, just which dirty data should be focused on. What factors bear upon the increased accessibility of such data?
New resources and changing standards partly account for it. Public interest groups and foundations offer support and audiences for such data. Technical changes such as computer advances in the storing, retrieval, and analysis of data, new devices for unobtrusive data gathering, 5 and better measurement techniques and means of communication have greatly increased the capacity for such research. We are measuring more and better than ever before. Not surprisingly, some of what is measured has discrediting implications. 6
New laws and procedures such as the Freedom of Information Act, 7 the Buckley Amendment requiring access to one's own records, the many state and local "sunshine laws" requiring open meetings, recent legislation (e.g., Civil Service Reform Act of 1979) and judicial decisions protecting whistle blowing, toll free lines for anonymously reporting government fraud, ombudsmen programs and more formalized procedures for filing grievances, new forms of public disclosure and reporting requirements offer a cornucopia of data. Entrepreneurs such as former CIA agent William Walter Buchanan may bring this material to our library microfiche machines. Buchanan summarizes, indexes, and reproduces for sale on a subscription basis recently declassified documents. A majority of large universities and research libraries subscribe, as does the USSR.
New organizations concerned with dirty data discovery have appeared, such as the University of Missouri's Center for Investigative Reporters and Editors, the Freedom of Information Clearing House, and an investigative organization made up of journalists, former congressional investigators, and lawyers put together by Watergate prosecutor Terry Lenzner. 8 A concern of many public interest groups is encouraging data collection on topics such as auto safety and energy.
There is an emerging dirty data methodology and increased sophistication in using it. Books, articles, and how-to manuals abound. Radical caucuses within academic disciplines and professional associations have contributed to it, as have college courses such as those on investigative journalism. The cohort of journalists, lawyers, and social researchers receiving its professional socialization during the 1960s through the mid-1970s has played an important role in such research.
Persons ready to work in the dirty data fields found a ready market for their crop. Career options, rewards, and publishing outlets were available. It is significant that "60 Minutes" was one of the most profitable and highly rated television programs in the 1970s. Even if one were to question whether accessibility has increased, the case for the increased ability to publish such data seems clear (though whether the publication of dirty data materials has increased proportionately or only absolutely, as the total amount of published materials has increased, can not be determined without sampling and content analysis).
Righteous indignation, which once went into concern over gambling, prostitution, liquor, radicals, and ethnic groups has found new targets such as consumer rip-offs, corruption, and environmental spoliation, where discovery and documentation may play a greater role. With improvement in many of the social, economic, and political aspects of American life, procedural issues may be taken more seriously. In fine Tocquevillian fashion, with improvement has come higher aspirations. The size of the gap between ideals and reality which the general public is willing to tolerate has been steadily reduced in the twentieth century. This may receive expression in efforts to make it easier to discover dirty data.
Of course, one can argue that the availability of dirty data is illusory, diversionary, and lulling. Following Marcuse, it could be argued that the belief in a free and open society masks what is really happening. The real dirt stays hidden, while the masses are titillated with what used to be confidential Magazine (and now People) revelations, or an occasional sacrificial goal like Agnew. In the cycle of infinite regress, which generates continual uncertainty for spies, seeming discoveries may be designed to throw you off, being faked or unrepresentative. It could be argued that the continued exposure of dirty data, rather than being shocking, becomes boring, and may indirectly perpetuate a corrupt system through generating public cynicism and lowering expectations. As Sherman (1978) observes, scandal as a mechanism for social changes is limited. The high media visibility given to some data might also offer a distorted picture of how much discovery is actually going on. The costs of discovery and publication can be very high. At the extreme there is death, as with investigative reporters Don Bowles in Arizona and Paul Jacobs in California (the former was murdered and the latter died from exposure to radiation). There may be imprisonment, as with reporters who fail to reveal their sources when a judge demands it, or the loss of income (e.g., impounding of the royalties of former CIA employee Frank Snepp after the publication of his book Decent Interval in 1977).
Accessibility is also relative and tentative. While the U. S. may be more open than most countries, it could be appreciably more open in terms of adherence to laws currently in existence, as well as in terms of new laws and procedures designed to insure an even greater degree of openness. There are also counter trends, such as increased organizational sophistication regarding protective measures (codes, paper shredders, debugging devices, and other electronic surveillance, lie detectors, and nondisclosure agreements). The Reagan administration's efforts to restrict the FOIA suggest the fragility of recent advances.
However, regardless of its broader meaning or whether ease of discovery and publication has increased, it is clear that dirty data exists and researchers sometimes make use of it. What is involved in the discovery (or failure to discover) such data? Let us consider four broad ways that dirty data is discovered: uncontrollable contingencies, volition, deception, and coercion. (These methods are given in detail in Table 2.) In doing this we will look beyond the current changes considered above, to more enduring characteristics of American society, social structure, interaction, and personality which are conducive to disclosure and discovery.
Uncontrollable Contingencies
The complexities and interdependencies of modern life, which too often thwart efforts at rational planning and intervention for the public good, may also thwart conspiracies. Former spy and novelist John Le Carre, in speaking about intelligence operations, finds it "difficult to dramatize the persistent quality of human incompetence. I don't believe that it's ever possible to operate such a clear conspiracy (as in his novel The Spy Who Came in From the Cold) where all the pieces fit together." 9 While this may overstate the case, there is an element of indeterminacy in human actions which often works in favor of disclosure. Folklore and literary treatments such as Cervantes's "murder will out" and Shakespeare's "by indirections find directions out" capture elements of this.
Those involved with dirty data may face exposure or suspicion due to factors beyond their control. Failures, accidents, mistakes, coincidences, victims, fall-out, remnants, and residues can all offer indications of dirty data close at hand. Strictly speaking, these offer an opportunity rather than a strategy, for data collection. The strategic elements emerge in the varying degrees of skill required to ferret them out. The event is also distinct from data collected about it which, of necessity, must be selective.
Some uncontrollable contingencies are "merely personal." Thus we learn that Richard Pryor, as a result of a fire in his home, was probably using cocaine, or that the late Governor Rockefeller, when he died in the company of a young assistant, may have been having an affair. But others are keys to organizational deviance and problems. The death of Dorothy Hunt in a plane crash, with thousands of dollars, tells us about Watergate hush money; miscarriages and infertility in Oregon and upstate New York reveal hazards of pesticides and industrial waste; the oil spill in Santa Barbara points to collusive relationships between the oil industry, academics, and government (Molotch and Lester, 1974); the mishap at Three Mile Island shows the failure of equipment and regulatory policies; the fire at the Las Vegas MGM Grand Hotel exposes fire and building code violations; the deterioration of a new building at the University of Massachusetts tells us about fraud in the awarding of construction contracts.
Traces or residue elements are separate from accidents and mistakes, and perhaps surer sources, in that for certain types of infractions they will always be present. 10 The difficulty is, of course, knowing how to identify and interpret them. Some trace elements are manifest and available to anyone: a missing person, signs of forced entry, missing documents, gaps on a tape, or red dye on money, clothing, or skin (from a canister slipped between currency that exploded shortly after a bank robbery). Other trace elements are latent: powder that can be seen only under ultraviolet light, fingerprints, electronic impulses, inadequacies of counterfeit money, or documents. These require special skills to discover. Many electronic surveillance devices emit signals which can be read. Most complex illegal undertakings will leave physical clues, whether fingerprints or the paper trail of laundered money. Some investigators have received notoriety for literally sifting through the garbage, looking for telltale signs. Instructional manuals and training materials are another trace element. Thus, the Supreme Court in the Miranda decision drew upon such material as evidence of police violations of Constitutional protections.
Trace elements involving victims are likely to become publicly known to the extent that (a) the gap between victimization and its discovery is short, (b) the victim is personally identifiable, (c) the victim is aware of the victimization, and (d) does not fear retaliation for telling others about it. There is a parallel here to the ease of discovering victim as against victimless crimes. The former are much more likely to be known about.
Trace elements, of course, need not be physical. One clue to possible dirty data lies in an organization's internal rules and policies, external laws concerning it, and professional codes of ethics. As Durkheim suggested, their presence is often a sign that members will face temptations to behave contrariwise. Their presence is also a clue to the presence of pressures toward social control corruption. Sociologists can draw upon their knowledge of organizations for clues to where dirty data is likely to be found. Certain structural and cultural characteristics can serve as likely barometers of dirty data. The developing literature on organizational deviance contains many clues (Needleman and Needleman, 1979; Sherman, 1980; Finney and Lesieur, 1982). Elements of folk culture such as humor, slang, nick-names, graffiti, and gossip can also offer clues.
Volition
This is a broad category. Whistleblowers, informants, and overt participant and non-participant observers share in the willing provision of discrediting information.
Given the prestige of scientific research, it is not surprising that many persons are willing to participate in large-scale anonymous self-report studies regarding their criminal or sexual behavior. Many researchers have followed in the path of Wallerstein and Wyle (1948) and Kinsey et al. (1948) in using this method. More surprising is the extent to which dirty data revelations can come forth without anonymity. Interviewers and observation are the major source for such data. Memoirs, biographies, letters, and other personal documents are important and under-used sources. Social and psychological factors can be conducive to revealing secrets.
Accounts of fieldwork often suggest that, once rapport and trust are established, people are often only too willing to talk (e.g., aside from the legions of investigative reporters, social researchers such Polsky, 1967; lanni and lanni, 1972; Chambliss, 1978; Klockars, 1974; Galliher, 1980; and Millman, 1977). Primary group relations are partly based on sharing information. This can be a means of expressing solidarity. Informants may wish to help. They may have a desire to be understood and to explain their actions. There may be pride in their technical skills for which recognition and aesthetic appreciation is sought. They may feel a need to justify their involvement with dirty data, have a Dostoyevskian compulsion to tell, or enjoy the sense of power noted by Simmel (1950) that comes from sharing secrets. Isolated from opposing definitions and confident in their actions, they may be open because they do not see their behavior as discreditable. Insiders can also be used as researchers to give otherwise unavailable access, e.g., Walker and Lidz's (1977) use of Black street addicts.
The fact that there is frequently a lack of congruence between individual and organizational goals can also be conducive to the revelation of secrets. Information is a resource, just like income or authority. It can be used to damage rivals or traded to enhance one's own position. Money and offers of immunity, or other help, can often buy information. There may also be hope of great riches and fame (and perhaps, for the lucky few, a movie or TV series) from writing a book about one's secret activities. In the first six months of 1980 alone, the CIA reviewed 22 manuscripts by former CIA agents. 11
Whistleblowing is a dramatic form which has increased in the last decade (Westin and Salisbury, 1980; Dudar, 1979; Bok, 1980; Nader et al., 1972; Peter and Taylor, 1972; Government Accountability Project, 1977). New laws and policies have attempted to encourage and protect it. For example, the 1977 Toxic Substances Control Act requires that employees and officials of chemical firms be instructed about their legal obligation to report chemicals posing substantial health or environmental risks. Here, not to whistleblow becomes illegal. In many government agencies, employees must report bribe offers.
New anonymous tip and complaint receiving mechanisms make it easier to whistleblow or report improper behavior. These vary from 911 mail boxes in some cities for reports of police abuse (police themselves are often major users), TIP (turn in a pusher) programs, and toll free numbers where violations can be reported to government agencies. The first and best known of the latter is that of the General Accounting Office. It has received about fifty calls a day since it was established in 1979. Following its success, the Office of Management and the Budget has required certain federal agencies to establish such lines. While the initial complaints are nor made public, they may result in court cases which are public.
Some whistleblowing comes from highly idealistic persons who are shocked by what they see in the day-to-day operation of their agency. This type of whistleblowing is likely to be common in a society such as the United States, with Puritan roots and a highly moralistic political and cultural style. In instances where the occupation attracts idealistic people and where the gap between ideal standards and actual practices is large, whistleblowing is more likely. This seems to be the case for some whistleblowing within law enforcement (e.g., Wall, 1972). Conflicts between the highly educated professional's sense of expertise and bureaucratic and political realities in large organizations is another source of whistleblowing. As both the need for professionals and bureaucratization increase, so too may whistleblowing (Perrucci et. al., 1980).
Beyond generating data about which trumpets, let alone whistles, could be blown, organizations also may generate personal motives. Complex organizations do not reward people equally. Some persons are likely to be angered over blocked mobility or rewards they see as insufficient.
Whistleblowing involves a conflict between the person who would tell and the organization (or at least its dominant leaders) who wish to keep things quiet. More subtle and more common is the leak, where the release of discrediting information is a device for serving some other organizational purpose. There is also a category of "give-aways," where persons do not realize they are revealing dirty data. This may be because of its highly technical nature, because its providers don't know that you have other data with which you can match it, or neutral data may become dirty in time, as new developments such as accidents, illness, or environmental spoliation become manifest and lead to reassessment and reinterpretations. 12
Some dirty data appears at the intersection of several pieces of conventional data which may be easily available. This can be the case with research on the concentration of economic and political power (e.g., Domhoff, 1979; Useem, 1980). Other examples can be seen in the case of people processing institutions (e.g., commitment to mental hospitals) whose formal records describing a person's behavior may conflict with accounts of friends and family and coworkers. When these are juxtaposed and in contradiction, the researcher may wonder about the agency's actions.
The fact that dirty data is often available should not cause us to miss the point that it is less available than clean data. We are dealing with conflict relations and the need for secrecy. The active investigator need not wait for accidents to occur, or whistleblowers to come forward. He or she can also take the initiative. More subject to the control of the researcher are methods involving deception and coercion. These assume an adversarial or conflict model.
Deception
The use of deception is familiar in social science research, particularly that of a social psychological nature. However, it has been used far less to gather dirty data than for other reasons (e.g., to gain access to worlds normally denied the researcher, to gain data on matters kept private, to not bias responses by telling people what is being studied or that a study is being done, and to manipulate the participant's situation in accordance with notions of causal variables, or, as with candid camera, merely to see what happens). There is a sizeable literature on deception in traditional social psychological experiments on covert participation observations and on information collected under false pretenses (Keyman, 1977; Hilbert, 1980; Humphreys, 1975; Warwick, 1975). But there has been little discussion of the role (and power) of deception as an information gathering strategy involving "reality experiments" in dirty data contexts.
Here the logic of the experiment may differ from its more conventional social science usage involving control and experimental groups. There may be no control group, since the goal is empirical description rather than an effort to test causal theory. More refined inquiries, often at a later point in time, or research where the dirty data lies in documenting a pattern of differential treatment by race, sex, offender status, etc., will have the traditional control group.
After a long, relatively dormant period, muckraking journalism has shown increased vigor. Contemporary investigative journalism, whether newspapers such as the Chicago Sun Times, or television programs such as "60 Minutes" or "20/20," offer many examples. The Sun Times has been a leader in use of the technique. For example in an investigation of insurance fraud, it had its investigators pose as victims of minor accidents with whiplash. They were then led down the road to hospitalization and insurance fraud by unwitting lawyers, doctors and hospital staff. Chiropractors came to treat "injuries" created by lawyers. In some cases, expenses of $40,000 were generated, treating essentially healthy investigators. In another case, reporters opened a Chicago bar and documented requests from various government agents for bribes.
Social reform groups have used the tactic to document and publicize problems. For example, the Chicago Better Government Association's inquiry into voter fraud had two investigators assume the life of winos and move into a skid row flop house. They registered under names such as James Joyce and Ernest Hemingway. When the voter lists turned up a short time later, Joyce and Hemingway were on them and actually voted! (Los Angeles Times, Oct. 9, 1977).
There are a great many law enforcement examples. Some of the more elaborate involve police posing as fences, pornographers, or sheiks bearing bribes as in ABSCAM.
The social research literature shows a smattering of deception used in this fashion. Schwartz and Skolnick (1962), in seeking to study the effect of a criminal court record on employment opportunities, had an "employment agent" visit a sample of one hundred employers. The agent presented one of four fictitious employment folders. The folders were exactly the same, with the exception of the criminal record (this varied from no record, to acquitted, to convicted and sentenced). As the seriousness of the record increased, chances for employment decreased; even those who were acquitted had much less chance of being hired. In Los Angeles, Heussenstamm (1971) sought to study claims of police harassment of the Black Panthers. Fifteen "typical" students with "exemplary" driving records agreed to put Black Panther party bumper stickers on their cars. The cars were in good condition and the students drove as they normally did. They received 33 citations in 17 days, and the study had to be stopped. Phil Zimbardo (1969) left what appeared to be abandoned cars on New York City and California thoroughfares and watched to see if they would be dismantled.
The National Wiretap Commission's interest in the availability of illegal wiretaps led its investigators to call 115 randomly chosen private detective agencies in seven large cities. They identified themselves as businessmen interested in tapping a rival's phones. In more than a third of the cases, the agencies contacted offered to install the illegal taps; many that refused offered "to show the callers how to do it themselves" 13 (O'Toole, 1978:75).
Selltiz (1955) reports an early experiment that used matched pairs of Black and White diners to assess discrimination in restaurants. HUD has used an equivalent tactic in studying housing discrimination. Black and White auditors who were otherwise similar responded separately to the same rental-sale opportunities. They found Blacks were systematically treated less favorably and courteously than Whites. 14 The Supreme Court in a 1982 decision ruled that a "tester" who is misled has standing to maintain a claim for damages under the 1968 Fair Housing Act.
Jesilow and O'Brien (1980), in a study of deterrence and automobile repair fraud, had women approach randomly chosen garages with the story that they were changing residences and their cars would not start. They told the repairmen that the car battery was in the trunk of the borrowed car they were driving, and requested the battery be tested. The battery, of course, was fine. Depending on the group sampled (experimental, control, pre- or posttest), from five to twenty percent of the time a new battery was recommended.
Pontell et al. (1980) has proposed using patients with prediagnosed common ailments to study fraud in government-funded medical benefit programs. The method involves using patients with equivalent symptoms. Some of the patients are entitled to health care benefits. All are to pay out of their own pockets for services rendered after each visit. The quality of treatment received and suggested would then be rated by a panel of doctors, and records of the insuring agencies checked to see if double-billing occurs. In a related case, investigators (including a U.S. senator) for a Senate Subcommittee on Long-Term Care (1976) visited medical clinics. They complained of colds and other minor symptoms. The inquiry documented numerous examples of fraud, and inferior and unnecessary care.
While not dirty data in the sense of the examples consider here, which have a willful quality, experiments designed to test the quality of professional diagnosis and skills may make similar use of deception. For example, Rosenhan (1973) discovered that health "pseudo-patients" who checked themselves into mental hospitals with vague symptoms could be diagnosed as schizophrenic.
Infiltration or covert participant observation is another form of deception. Its main domestic users are police, industrial espionage agents, and occasionally journalists, activists, and researchers. It involves fitting into some ongoing set of activities rather than generating new organizations and activities. To the extent that the infiltrator takes a more active role, trying to consciously influence what happens, the method has some of the quality of an experiment. At one extreme, the observer functions like covert electronic surveillance, merely transmitting what is going on; at the other, is the researcher as agent provocateur. 15
Coercion
This includes a variety of means that share compulsion as the essential mechanism. People or organizations are required to furnish information under threat of various penalties ranging from imprisonment, fines, revocation of license, or withholding of goods, services, or privileges desired by the possessor of the information. Through laws, courts, and policies, government, with its power to coerce, is the major source of institutionalized discovery practices.
Investigations and hearings by Congressional, presidential, state and local commissions, and various government agencies such as the General Accounting Office and Office of Management and Budget with subpoena powers are major sources of information. In the case of historical data, most files are open to researchers after 50 years. Annual reports and reports of inspectors general can provide rich data. The Freedom of Information Act is particularly relevant to the needs of researchers, though in most states there is nothing like it at the local level. Reiss and Biderman (1980) have compiled a list of federal data sources available for the study of white collar crime.
Court records including indictments, testimony, and evidence can offer valuable information. However, one must know where to look and have resources for generating a transcript from the court record. If the case is not appealed, a transcript will be unavailable. New federal discovery rules require prosecutors to make available to the defense information in their possession relevant to the case. Grand jury data, which in general is kept secret, can be very powerful. Another good source of data lies in the routine reports that many occupational groups are required to file, e.g., for doctors (prescriptions, tissue samples), for police (bribery attempts, use of force, weapons discharge), and for those in congress (campaign contributions, conflict of interest).
Lawyers and criminal investigators may use coercive confrontation tactics to gather information. These vary from threats to subpoena, arrest, or sue, to blackmail and the use of force. (While less prevalent now in the U.S. than previously, the third degree is one such technique.)
Brief mention can also be made of a nonconventional data gathering technique which falls between the above methods: ESP. This method is apparently taken quite seriously by Eastern European and Israeli intelligence agencies, who are engaged in extensive research on it (Deacon, 1977). Police departments also have experimented with the use of psychics. While it raises interesting issues for validity, ethics, and privacy, were it to be used in social research, it is possible that, in the future, ESP may emerge as still another form of gathering dirty data. What type of technique is it? It might be classified as coercion because it is against, or independent of, a person's will; as deception because it is carried out covertly; or as an uncontrollable contingency, because of the residue or traces given off by thought and behavior which the expert draws on. It is also interesting to speculate on what form counterespionage might take.
Protecting Dirty Data
The would-be discoverer of dirty data must ask, how do organizations attempt to protect their information? The other methods of discovery we have considered have counterparts in actions taken to protect secrets.
Organizations may attempt to limit the damage from accidental or coincidental discoveries by diffusing and hiding responsibility, by having "need to know" rules (even for those who are highly trusted), by compartmentalizing activities, by using code names and a cell organizational structure, by delegating dirty work in a nontraceable way, by having mechanisms which insulate higher status persons from traceable "contamination," by eliminating witnesses, and by having contingency cover-up plans. Paper shredders and refuse burned under guard are means of thwarting garbage detectives.
Efforts to avoid informing and whistleblowing can be seen in recruitment, socialization, and sanctioning patterns. Background investigations are one means. Recruiting on the basis of ethnicity, or friends or relatives, is also thought to increase reliability. Kinship was a prominent device in underground networks in World War II and is a factor in some criminal enterprises. Loyalty may be cultivated by good working conditions and rewards, as well as appeals to shared values. In a kind of institutionalized blackmail, culpability may be built in by requiring or maneuvering employees into participation in illegal or potentially discrediting actions. To indict the organization would thus be to indict oneself. Long training, testing periods, and, as employees come to prove their reliability, gradual exposure to an organization's secrets are other devices. Contracts binding employees to secrecy and subjecting them to civil and criminal penalties for divulging information are also relevant. The CIA, for example, claims the right to censor what its former employees publish and, under some conditions, can stop them from getting royalties. Violence against informants, stigmatizing the tattletale, psychiatric labeling, dismissal, the loss of a pension, and blackballing are other devices intended to deter the sharing of secrets.
Awareness of deception and institutionalized coercion as information gathering tactics may give rise to a variety of strategic actions designed to mislead and limit what can be discovered. Thus, the deception of the investigator may be matched by the counterespionage of the target who gives out or permits false data to be discovered. The requirement that reports be filed or testimony given does not insure that they will be accurate. Awareness of legitimate and illegitimate electronic surveillance may mean electronic countersweeping, or restricting key communications to areas where bugs are unlikely. Communications can be faked, guarded, and disguised when a wiretap or bug has been discovered. There is a dialectic between discoverers and keepers of secrets, as they reciprocally adjust their behavior. A discoverer's advantage may be only temporary, or work best on amateurs who have not found a form of neutralization.
Contrasting the Methods
We have considered four general methods by which researchers can obtain hidden and dirty data. 16 How do these methods compare with respect to criteria such as ethics, representativeness, reactivity, susceptibility to the researcher's initiative, range of topics covered, validity, skill requirements and costs? Table 3 contrasts the methods with respect to these criteria. Without going over every cell in the table, let us highlight some of the major characteristics of these methods.
The major advantage of the deceptive experiment is, of course, its incredible power to pierce the protective shield of secrecy that is likely to surround those involved with dirty data. It yields primary data, is subject to the researcher's initiative, and permits testing hypotheses. With adequate resources, one can take a sample of appropriate settings or subjects and hence make a case for its representativeness (though of just what can be problematic, as will be noted).
Replication and a degree of control over the variables involved are possible (although with field experiments such control is always limited). Skill requirements are moderate, although imagination is needed to develop the precise tactics, and skill in acting may be required.
The major disadvantage of the method involves questions of ethics and the meaning of the data. There are potentially corrosive and problematic aspects of using such tactics. Deception involves important ethical issues such as lying, invasions of privacy, manipulation, and involving subjects without their consent. In getting at the dirt, one may get dirty oneself. Seeking data on illegal actions may draw the researcher into illegal activities, and he or she may face temptations not usually considered in graduate methodology classes.
For reasons of resources and ethics, a rather narrow range of issues has been covered using the deceptive experiment. For social researchers, it is best used in attempting to document a pattern of victimization of persons that involves specific actions at one point in time. The dirty data appears as a result of actions taken by the target relative to a subject, e.g., fake voter registration, discrimination in housing, employment, or law enforcement, or consumer fraud. The dirt can lie in a clear violation of laws or policies, or in a more subtle misuse of discretion. The researcher presents him or herself as a client, patient, stooge, or ally, and sees if the hypothesized behavior is, or appears to be, forthcoming. There are limits here, as the researcher is unlikely to wish to take this to a point where actual damage is done, or the law is violated, as with unnecessary surgery, actually paying a bribe, or purchasing contraband.
It is much easier to become involved in an ongoing setting at one point in time than to create an entirely new setting. Elaborate hoaxes, such as fake criminal enterprises run by police, generally go far beyond the resources and competencies of the researcher. The cost, skill requirements, and risks of discovery increases, the more complex the deception and the longer it goes on.
The data from deceptive experiments can be questioned with respect to both their validity and generalizability. One source of error lies in the strategic actions of those with secrets to protect. They may discover the experiment and take what amounts to counter espionage actions, giving out deceptive information themselves. Thus a finding of no dirty data must always be considered in light of the question, "was the target suspicious and hence behaving in an atypical way?" A second source of error lies in reactivity or entrapment, as subjects respond to subtle or obvious pressure from the researchers. The degree of passivity is an important factor here. Does the researcher merely offer an opportunity, or go beyond this to actively encourage the subject to use it?
Even if the results are valid, the intervention may have an artificial quality, making it unclear what it should be generalized to. As with any experimental undertaking, even those done in natural settings, there is the question of whether it is representative of real-world settings or merely other experimental settings. For example, this seems to be the problem with discovering that persons will respond criminally to deceptively provided opportunities that are appreciably more tempting than those likely to be found in reality, or discovering (using Black and White investigators) that there is housing discrimination against Blacks in elite White areas in which few Blacks may wish, or be in a position, to live in. An important factor here (and one recognized by the courts in criminal cases) is how closely the artificial setting corresponds to those in the real world.
Deception in the case of infiltration shares many of the advantages and disadvantages of the deceptive experiment, yet some differences can be noted. Since it goes on for a longer period of time and is more diffuse, the range of topics or issues covered is likely to be much broader than with experiments.
The risks, ethical issues, and temptations faced are likely to be far more serious than with the onetime experiment, especially those where the researcher is offered as potential victim. Joint illegal actions may be demanded as the price of access, discovery of one's true identity can lead to physical harm, and the researcher with knowledge of others' wrongdoing can be compelled to testify. The researcher does not have the immunity or right to confidentiality that law enforcement, medical, and religious professionals have.
Since there is usually only one covert participant, cross-observer data cannot be a source of validation. Representativeness is also likely to be an issue, since access is often a matter of idiosyncratic local factors related to previous work experience or friendship patterns.
Data obtained through the various means of institutionalized coercion are unlikely to be otherwise available. In general, they do not raise profound ethical issues, are relatively inexpensive (generally involving the cost of photocopying, though where elaborate FOIA searches or the transcription of court records are required, expenses can be large). The data may be available on a scale far beyond what the researcher could normally obtain.
However, the researcher has less initiative, and replication and control of relevant variables are rarely possible. Merely locating the information can be a problem. Court data are not centrally located nor indexed in ways that benefit the researcher. Even with FOIA requests, the researcher needs to know what he or she wants. Because such data is initially gathered for purposes other than research, it tends to have the usual drawbacks of secondary data. This sometimes is avoided by the researcher being able to advise commissions, committees, legislative bodies, and courts as to exactly what data should be gathered. With regularized reporting requirements that apply categorically, representativeness is not an issue. However, with grand jury or court proceedings, issues of representativeness may loom large. Cases for which data is available may represent particularly grievous instances, or idiosyncratic prosecutorial factors.
The validity of data gathered under coercion must be carefully scrutinized, given a conflict setting and the likelihood of strategic responses. In the case of court testimony, those facing criminal charges may have a strong incentive to lie, particularly if they are offered immunity. With respect to records, false or misleading reports may be filed. The former may be a natural response for agencies (e.g., national security or law enforcement) professionally involved in dissimilitude. Awareness that one's documents are subject to the Freedom of Information Act may mean that far less is written down, or that written records are destroyed as soon as possible 17 (thus, records prior to the 1966 passage of FOIA, or the stronger version in 1974 following Watergate, are likely to have the greatest validity). The concept of "maximum deniability" that surfaced during the Congressional Hearings on U. S. Intelligence activities in the 1970s certainly predates reporting requirements and the FOIA. Even where things are written down, observer should be on guard. As a federal employee with extensive experience in the handling of the kinds of records researchers seek from government observes, "the 'cooking [falsifying] of records' has gone on as long as there has been any government anywhere" (The New York Times, July 11, 1982)
As a means of misleading their competition, businesses may patent their failures. Through tricks and subterfuge, reports filed may be technically, but not actually, correct. Thus, assets or conflicts of interest may be hidden. Silent partners may not be listed. Those listed as owners may merely be fronting for others. Bribe offers and payments may be made indirectly through lawyers. The most serious offenders may never file reports (for example, some businesses avoid licensing and reporting requirements by continually changing their name and place of business).
Information, which is given voluntarily, avoids many of the ethical issues around deception and coercion. To take the case of the whistle-blower, the ethical issues fall primarily on the bearer of the data, and much less on the researcher who uses it. Since the whistle-blower tends to come forward on his or her own, reactivity is not likely to be an issue. The range of topics covered is likely to be broad, the method is inexpensive, and it does not require a highly skilled researcher to use it.
Yet it clearly has disadvantages. As in the accident, the researcher's role is rather passive. The researcher usually must wait for the whistle-blower, though good field workers can sometimes gain equivalent information from their informants. Problems of representativeness are often present. It is difficult to know what the account of the whistle-blower represents. The rarity of whistle-blowing can be used to argue that the very atypicality of the case was what generated whistle-blowing in the first place. Great care must be taken with respect to validity. In coming from the horse's mouth, whistle-blowing can have a persuasiveness which anonymous, or outside, data sources lack. The personal motives of the whistle-blower can lead to distortions, exaggerations, and outright falsification. Skepticism and a critical attitude are necessary in the face of seeming data gifts, whether from whistle-blowers, accidents, or informants. Are they what they appear to be? The researcher must be especially careful when the whistle-blower's account supports the researcher's own ideological stance toward the organization or issue in questions.
The uncontrolled contingency does not present serious ethical problems, though dealing with trace elements may involve violations of privacy. Since the data appears independently of the actions of the researcher, reactivity is not a problem, and it is inexpensive. Nor is validity likely to be an issue (though in highly controversial or conflictual areas there is always the possibility that a trace element or accident is not what it appears to be and was created to cover up something else, to damage someone, or to pursue self-interests). The researcher must always ask, was the accident faked, the evidence planted, the dirty artifacts, or data, counterfeited or contrived?
Unlike the other methods, the uncontrolled contingency is not a strategy that the investigator can initiate: replication and control of extraneous variables are rarely possible. Rather, it is an opportunity, which the researcher reacts to, although the investigator is aided by knowing where to look.
The occurrence of an accident is, of course, no guarantee that dirty data is present, nor, if it is, that it will become available. Its occurrence may be covered up and access to data about it denied. 18 The discovery of cover-ups raises the intriguing question of the ratio of discovered to successful coverups. Is success typical and discovery primarily due to incompetence or bad luck? Even if this is not the case, issues of representativeness plague generalizations from accidents. It is difficult to know what they are representative of.
When data are available, there may be major problems with respect to their meaning and interpretation. Seeing smoke does not necessarily tell you what kind of fire is present, nor what caused it, nor how to put it out. Facts do not speak for themselves, though some seem to whisper louder than others. This is part of a more general issue in social problems research. It is much easier to document a problem than to explain it.
Accidents can be a key to discovering that dirty data is present. But the collection of data is likely to depend on further actions. The publicity around an accident can mobilize resources and political support to set coercive data collection procedures (courts, grand juries, commissions) in motion.
Some Research Issues And Needs
My purpose in this paper has been to call attention to a type of data with particular relevance to social problems research and to contrast some methods for obtaining it. Substantively, the topic of dirty data touches many areas, including the sociology of knowledge and science, secrecy, stratification, face-to-face interaction, mass communications, and deviance and social control. It suggests a number of researchable questions:
Unlike the social researcher, most other secrecy investigators are themselves protected by secrecy, and need publicly document neither their sources nor their methods. These may be protected legally and by professional standards. Indeed, even results may be kept secret or used only as needed. But, given the demands of scientific communication, the researcher is expected to go public. He or she must describe where the data comes from and how it was collected. The standards of evidence for making a scientific claim are higher than for journalism or the law. In the case of journalism, for example, issues of causality, methodological failings, and representativeness generally receive little attention. Social researchers also differ from many government agents in that they cannot offer large sums for information, put together a grand jury, issue a subpoena, compel testimony, offer immunity, nor legally wiretap. 22 The closest we can come is occasionally advising government bodies, as with special commissions or courts that can do these things. Whether considering government agents, or journalists, our resources are more limited and our standards in general are more restrictive with respect to using coercion and deception. 23 Academic norms of civility and gentility operate against some of the more roughshod methods of others in the dirty data discovery business. 24
But beyond these differences, I think social researchers also make less use of these techniques as a result of lack of awareness and familiarity, and perhaps a generally less skeptical and cynical occupational world view than is the case for police, lawyers, and investigative journalists.
There is a small literature on dirty data research issues, particularly in considerations of fieldwork methods (e.g., Social Problems, Feb. 1980), social psychological experiments, and in the substantive areas of political sociology (around the study of power elites) and deviance-criminology. But this literature is restricted and has not dealt adequately with the new issues and opportunities of the last decade. Our methodology textbooks tend to be sadly lacking in guidelines for such research. We can learn a considerable amount from those professionals who routinely seek to discover dirty data. Their results may offer rich materials for secondary analysis. We might also make greater use of their methods for primary analysis, as some historians have done using the FOIA.
The array of methods we use could be broadened. Methodology texts and course should give more attention to these methods and resources for obtaining dirty data. We should become as familiar with the University of Missouri's CIRE (Center for Investigative Reporters and Editors), as with the University of Chicago's NORC. Students should be taught how to use the Freedom of Information Act (e.g., see Committee on Government Operations, 1977; Center for National Security Studies, 1979), just as they are taught how to draw a sample. We should communicate to our students the joy of discovery and also what a discovery motion in court is. Berman Associates' checklist of Congressional Hearings, and the monthly list of Government Accounting Office reports, should become standard references. We should scan the periodic lists offered by the Nader-sponsored Freedom of Information Clearing House, just as we scan listings of government and foundation grants. The Carrolton Declassified Documents Reference Service should become as well known to us as the Yale Human relations area files. Just as some of us have learned how to contact the Census for demographic data, we should learn how to contact the Clerk of a court for legal documents. Civil and Criminal Court indices should be as well known as those at the Roper Center or the Inter-University Consortium.
While researchers sometimes consult "Who's Who?", they should also be consulting an obscure, formerly classified, publication put together by CIA office of security specialist Harry J. Murphy, entitled, "Where's What-Sources of Information for Federal Investigators." This is a marvelously rich compendium of sources of personal information such as private directories and government files. Methodological stalwarts, such as Lazarsfeld and Hyman, must make room for outsiders such as Mollenhoff's Investigative Reporting (1980) and Williams' "Investigative Reporting and Editing" (1977). First Principles, Mother Jones, and 7 Days should take their place alongside the more established academic journals whose contents we skim.
While I do not suggest that we learn to wire tap, if wire tap data presents itself, we should not necessarily ignore it. Furthermore, we should know where to look to find out if it is presenting itself. As Horatio Alger noted, the knock of opportunity does little for those who do not hear it. The ethical problem here is somewhat similar to that raised by whether or not a university or church should accept tainted money (as from a slum lord).
While the ethical issues are not to be taken lightly, and will limit us relative to actions that may be justified in war time, or even those routinely taken by police, I think sociologists can go further and be more imaginative than we have been in the kinds of natural field experiments we attempt. There is a need for standards and discussion in this area to be sure (Keyman, 1977; Klockars and O'Connor, 1979). However, perhaps different standards with respect to deception, privacy, informed consent, and avoiding harm to research subjects ought to apply when the subjects themselves are engaged in deceitful, coercive, and illegal activities, and/or where one is dealing with an institution which is publicly accountable.25 Even without resorting to ethically questionable methods, an astounding amount can be discovered through intelligence, knowing where to look and what to look for, diligence, and the cultivation of sources. The career of I.F. Stone, with its heavy reliance on congressional hearings, attests to this. Preferring publicly available information, and without resorting to deception, he has been a one person discover machine.
Many of the topics dear to the hearts of social problems researchers could be better illuminated were we to make greater, through restrained, use of methods for discovering dirty data. Yet the researcher in this area must judiciously walk a hazy line between the unacceptable extremes of taking the world at face value and believing that what is unseen is unimportant, as against thinking that nothing is what it appears to be and that whatever is hidden must, therefore, be significant. The presence of secrecy is a guarantee of neither theoretical nor social relevance. Even where dirty data is scientifically and socially relevant, respect for the law and individuals' rights must be carefully balanced against the scholar's concern with discovering the truth and contributing to reform. There are many instances where the former will preclude the latter. In spite of such concerns, increased attention to dirty data methods, topics, and issues is one factor required for better understanding of social problems.
Back to Home Page | Top | Notes | Tables
Notes
A recent trend in race
relations research involves hidden manipulations and unobtrusive measures
of racial response in field settings. Unobtrusive studies of race and helping,
aggressive, and nonverbal behavior and find greater discrimination than
would be expected from survey data (Crosby, Bromley and Saxe, 1980). Since
the subjects of such research tend to be chosen from diffuse publics (a
person walking on the street or shopping in a supermarket), rather than
organizations thought to discriminate, the direct relevance of such studies
to social problems research or amelioration is limited.
Tables
A. Nonsecretive and nondiscrediting data: Routinely available information. | |
B. Secretive and nondiscrediting data: Strategic and fraternal secrets, privacy. | |
C. Nonscretive and discrediting data: | |
1.sanction immunity, | |
2.normative dissensus, | |
3.selective dissensus, | |
4.making good on a threat for credibility, | |
5.discovered dirty data. | |
D. Secretive and discrediting data: Hidden and dirty data. |
TABLE
2: Methods of Obtaining Hidden and Dirty Data
Deception | ||
Experiments | ||
infiltration | ||
covert-surveillance | ||
information gathering | ||
Coercion | ||
Institutionalized discovery practices | ||
grand-jury | ||
Freedom of Information Act | ||
discovery motions | ||
record keeping requirements | ||
manipulation | ||
hypnosis | ||
chemicals | ||
ESP | ||
Volition | ||
Whistleblowers | ||
leaks | ||
gossip | ||
informants | ||
open field work | ||
archives | ||
personal documents | ||
Uncontrollable Contingencies | ||
Accidents | ||
mistakes | ||
victims | ||
coincidences | ||
traces | ||
fall-out | ||
residues | ||
remnants |
CRITERIA Method Ethics:
Researcher's Standpoint Representativeness Reactivity Susceptible
to Researcher's Initiative Experiments Problematic
because of deception and tampering without consent With
sufficient resources, not an issue A
problem—"Experimenter effects" Yes Narrow Infiltrators Problematic
if researcher is involved in disguised participation observation, less if
uses accounts of others Often
problematic, since access is rarely random A problem Sometimes Broad Uncontrollable
contingencies Not an
issue, other than privacy questions Likely a
problem, hard to determine how typical an event is Not an
issue No,
except for reading trace elements Narrow Whistleblowers Ethical
issues fall mostly on the whistleblower Hard to
determine, reasons to expect person is atypical Assuming
initial good faith on whistleblower’s part, not a problem No,
though institutional mechanisms may facilitate Broad Open
field work Can be
problematic insofar as it involves conflict between loyalty to subjects and
demands of outside society Usually
problematic, but with sufficient resources and access may not be Can be a
problem, but more amenable to control than with infiltrators Yes Broad Matters
of public record; Freedom of Information Act Generally
not an issue Hard to
determine, with respect to both what is recorded and what survives An issue,
insofar as subjects anticipate their data will become public and may shape it
accordingly Basically,
no; researcher must pretty much take whatever was written down and made
public Intermediate Experiments Not an
issue except for problems of reactivity Moderate Moderate
to expensive New data Infiltrators Can be
problematic—hidden agendae, lack of cross-observer
validation Good
acting ability, willingness to take risks, role conflicts Inexpensive,
moderate if researcher is infiltrator Uncontrollable
contingencies Generally
not an issue Moderate Inexpensive Lends
self best to discovery rather than explanation Whistleblowers Researchers
must be skeptical and ask what hidden agenda the informant may have Does not
apply Inexpensive Open
field work Less of
an issue Highly
specialized Moderate Unlike
other methods, does not assume a conflict relationship Matters
of public record Less an
issue of validity than relevance of what gets written down; given knowledge
that it may become public can generate methods of institutionalized evasion Minimal,
but you need to know what you are looking for Generally
inexpensive After-the-fact
data, you need an idea of what to look for/ask for, can be a useful followup to leads from uncontrollable contingencies or
whistleblowers
Back
to Home Page | Top | Notes
| Tables
TABLE 3: Some Criteria for Contrasting Methods for Gathering Dirty Data
You are visitor number to this page since November 1, 1999.