Notes on the discovery, collection, and assessment of hidden and

Notes On The Discovery, Collection, And Assessment Of Hidden And Dirty Data

in J. Schneider and J. Kitsuse, Studies in the Sociology of Social Problems, Ablex, 1984. I am appreciative of comments offered by Robert Perrucci, Diane Vaughan, and Ron Westrum

Back to Home Page | Notes | Tables

Gary T. Marx
Massachusetts Institute of Technology

What do ABSCAM, the Santa Barbara Oil Spill, and the Freedom of Information Act have in common? Or what do blackmailers, police, priests, journalists, and some social problems researchers have in common? (Perhaps we'd better not answer that.) In the first case, aside from what they may communicate about the pathos of the last decades, each represents a means of collecting hidden and dirty data. These means are experiments, accidents, whistle blowing, and coercive institutionalized discovery practices. In the second case, we have actors who routinely deal with discovering secret and dirty data.

Issues of discovering and protecting secrets confront everyone in daily life. But they are highlighted for certain occupations such As public or private investigators (including detectives, inspectors general, Congressional investigators, auditors and spies), journalists, social reformers and sometimes social researchers. They have particular saliency for the social problems researcher who may seek data which insiders wish to keep secret.

As a result of research on substantive topics such as agents provocateurs, informants, undercover work, frame-ups, cover-ups, and muckraking, I have become interested in what can be called the "hidden and dirty data problem."

What follows is an essay on dirty data research. It offers neither an explanation, nor fresh empirical data. Instead its purpose is to call attention to this data gathering problem, suggest some of the issues it raises and a framework for approaching them, and speculate on what may be involved. I self-consciously raise a number of questions which I do not begin to answer adequately. This is a necessary first step in the generation of the more systematic empirical inquiry and theoretical development that is needed. In what follows, I define dirty data; consider some factors contributing to an apparent increase in the ease of discovering it; contrast some basic discovery mechanisms; and consider some of the implications of dirty data for the study and understanding of society.

By hidden and dirty data, I mean just that. We can locate it more precisely by combining two variables into a typology (Table 1). The first variable involves information which is publicly available, unprotected, and open, at one end of the continuum, and information which is secret, private, closed, or protected at the other. If we bring this together with the second variable, which is a continuum with nondiscrediting information at one pole and highly discrediting information at the other, we have the typology of Table 1.

Hidden and dirty data lie (no pun intended) in Type D: It is information which is kept secret and whose revelation would be discrediting or costly in terms of various types of sanctioning.

The data can be dirty in different ways. But in all cases, it runs contrary to widely (if not necessarily universally) shared standards and images of what a person or group should be. Of course, all persons and organizations have minor discrediting elements and show a gap between ideal standards, public presentations, and private reality (e.g., Hughes, 1971; Goffman, 1963). 1 But by dirty data, I have something rather more formidable in mind than soft-core discrepancies. Dirty data at the organizational level ought to be of particular concern to the social problems researcher. Issues of hidden and dirty data are likely to be involved to the extent that the study of social problems confronts behavior that is illegal, the failure of an agency or individual to meet responsibilities, cover-ups, and the use of illegal or immoral means.

While Hughes' (1971) concepts of "dirty work" and "guilty knowledge" may at times overlap with dirty data, they are distinct. Neither of the former have to be hidden, and they may be central to the legitimate license and mandate of an occupation or organization. Worker designations of selected tasks as "dirty work" may be a means of sustaining a heroic or moral definition of their occupation (e.g., Emerson and Pollner, 1976). Society needs dirty workers. Certain occupations must engage the profane and are empowered to violate standards that apply to others. However, because of the protected opportunity structure they face, such workers often generate dirty data.

Discrediting or dirty data and secrecy tend to go together. They can, of course, be independent and even inversely linked. Secrecy is a basic social process contributing to group boundaries (Simmel, 1950; Tefft, 1980). A mandate to use it can easily lead to its unintended expansion (e.g., Lowry, 1972). In the form of privacy, it represents an important societal value (Shils, 1966; Warren and Laslett, 1980). Organizations protect dirty as well as clean data. As Type B implies, this protection need not serve nefarious ends. Furthermore, all that is dirty is certainly not kept secret, as in the very interesting cases in Type C, where efforts to protect discrediting information are not taken. 2 But such subtleties aside, a major issue for the social problems researcher is how to pierce the secrecy that so often surrounds the subject matter.

Commentaries on field research of course often consider the difficulties in obtaining valid and reliable data (e.g., Douglas, 1976; Van Maanen, 1979). They assume that individuals and organizations present a veil of secrecy masking what is really going on. The researcher must find a way to lift or slip through, over, or under the veil. Both self and group boundaries are partially maintained by controlling information given to others and outsiders.

Well known barriers to data collecting are concern for privacy, suspiciousness of, or reticence towards, outsiders asking questions, a lack of reciprocity in the researcher-researcher relationship, a desire to keep information from rivals or competitors, and a wish to put forward one's best face or group image. This is the case even when the data sought are not particularly discrediting. The problems are compounded, however, when we seek data that are in some way "dirty", as with some social problems, political sociology, and criminology research.

In these cases, data gathering is even more difficult. The vested interest in maintaining secrecy may be much stronger because illegal or immoral actions are involved and the costs of public disclosure very high. We may be dealing with people who are specialists at maintaining secrecy and deception. They may be part of organizations that routinely mislead or obscure. The issue can go beyond the withholding of information to offering what, in the intelligence trade, is called "misinformation" and "disinformation." Well kept secrets or deception may prevent the researcher from even knowing what to look for, questions of restricted access aside. How can the researcher hope to gather dirty data when the will and resources to block this are so strong?

Perhaps the most common response has simply been to stay away from such topics (the founding of the Society for the Study of Social Problems [SSSP] was a reaction against this). Like a river, researchers follow the path of least resistance. Or, perhaps better, like immigrants, we tend to go where, if we are not necessarily welcomed, we are at least tolerated. Often, of course, this is at the bidding (or at least with the resources) of the very elites who sit atop mountains of dirty data.

Yet, if it is valid to describe our attraction to more easily gathered "clean" data as a central tendency, such a characterization misses the considerable variation around the mean. While not the norm, nor a thriving industry, we do not lack for research of an investigative, muckraking, or scandal producing nature. The amount of such research has increased significantly in the last two decades, and it is possible that, human subjects limitations aside, such research will become more prevalent and prominent.

Factors Contributing To Increased Accessibility

Whether or not the relative amount of dirty data in our society has been increasing or decreasing is an interesting, and probably unanswerable, question. 3 However, there can be little doubt that accessibility to dirty data has increased in recent decades. 4 A not insignificant number of journalists, social reformers, and researchers have been able to gather information on highly discrediting phenomena which, according to a conspiracy perspective on the dirty data problem, we should be unable to study (one is reminded of the mathematical proof that airplanes cannot fly). The record might be read to suggest that there is an abundance of riches here. The streets may not by paved with gold, but they are often lined with partly visible muck. Instead of a paucity of information, we may suffer from a kind of dirty data overload. Rather than a lack of access, the problem may be in deciding, in a context of abundance, just which dirty data should be focused on. What factors bear upon the increased accessibility of such data?

New resources and changing standards partly account for it. Public interest groups and foundations offer support and audiences for such data. Technical changes such as computer advances in the storing, retrieval, and analysis of data, new devices for unobtrusive data gathering, 5 and better measurement techniques and means of communication have greatly increased the capacity for such research. We are measuring more and better than ever before. Not surprisingly, some of what is measured has discrediting implications. 6

New laws and procedures such as the Freedom of Information Act, 7 the Buckley Amendment requiring access to one's own records, the many state and local "sunshine laws" requiring open meetings, recent legislation (e.g., Civil Service Reform Act of 1979) and judicial decisions protecting whistle blowing, toll free lines for anonymously reporting government fraud, ombudsmen programs and more formalized procedures for filing grievances, new forms of public disclosure and reporting requirements offer a cornucopia of data. Entrepreneurs such as former CIA agent William Walter Buchanan may bring this material to our library microfiche machines. Buchanan summarizes, indexes, and reproduces for sale on a subscription basis recently declassified documents. A majority of large universities and research libraries subscribe, as does the USSR.

New organizations concerned with dirty data discovery have appeared, such as the University of Missouri's Center for Investigative Reporters and Editors, the Freedom of Information Clearing House, and an investigative organization made up of journalists, former congressional investigators, and lawyers put together by Watergate prosecutor Terry Lenzner. 8 A concern of many public interest groups is encouraging data collection on topics such as auto safety and energy.

There is an emerging dirty data methodology and increased sophistication in using it. Books, articles, and how-to manuals abound. Radical caucuses within academic disciplines and professional associations have contributed to it, as have college courses such as those on investigative journalism. The cohort of journalists, lawyers, and social researchers receiving its professional socialization during the 1960s through the mid-1970s has played an important role in such research.

Persons ready to work in the dirty data fields found a ready market for their crop. Career options, rewards, and publishing outlets were available. It is significant that "60 Minutes" was one of the most profitable and highly rated television programs in the 1970s. Even if one were to question whether accessibility has increased, the case for the increased ability to publish such data seems clear (though whether the publication of dirty data materials has increased proportionately or only absolutely, as the total amount of published materials has increased, can not be determined without sampling and content analysis).

Righteous indignation, which once went into concern over gambling, prostitution, liquor, radicals, and ethnic groups has found new targets such as consumer rip-offs, corruption, and environmental spoliation, where discovery and documentation may play a greater role. With improvement in many of the social, economic, and political aspects of American life, procedural issues may be taken more seriously. In fine Tocquevillian fashion, with improvement has come higher aspirations. The size of the gap between ideals and reality which the general public is willing to tolerate has been steadily reduced in the twentieth century. This may receive expression in efforts to make it easier to discover dirty data.

Of course, one can argue that the availability of dirty data is illusory, diversionary, and lulling. Following Marcuse, it could be argued that the belief in a free and open society masks what is really happening. The real dirt stays hidden, while the masses are titillated with what used to be confidential Magazine (and now People) revelations, or an occasional sacrificial goal like Agnew. In the cycle of infinite regress, which generates continual uncertainty for spies, seeming discoveries may be designed to throw you off, being faked or unrepresentative. It could be argued that the continued exposure of dirty data, rather than being shocking, becomes boring, and may indirectly perpetuate a corrupt system through generating public cynicism and lowering expectations. As Sherman (1978) observes, scandal as a mechanism for social changes is limited. The high media visibility given to some data might also offer a distorted picture of how much discovery is actually going on. The costs of discovery and publication can be very high. At the extreme there is death, as with investigative reporters Don Bowles in Arizona and Paul Jacobs in California (the former was murdered and the latter died from exposure to radiation). There may be imprisonment, as with reporters who fail to reveal their sources when a judge demands it, or the loss of income (e.g., impounding of the royalties of former CIA employee Frank Snepp after the publication of his book Decent Interval in 1977).

Accessibility is also relative and tentative. While the U. S. may be more open than most countries, it could be appreciably more open in terms of adherence to laws currently in existence, as well as in terms of new laws and procedures designed to insure an even greater degree of openness. There are also counter trends, such as increased organizational sophistication regarding protective measures (codes, paper shredders, debugging devices, and other electronic surveillance, lie detectors, and nondisclosure agreements). The Reagan administration's efforts to restrict the FOIA suggest the fragility of recent advances.

However, regardless of its broader meaning or whether ease of discovery and publication has increased, it is clear that dirty data exists and researchers sometimes make use of it. What is involved in the discovery (or failure to discover) such data? Let us consider four broad ways that dirty data is discovered: uncontrollable contingencies, volition, deception, and coercion. (These methods are given in detail in Table 2.) In doing this we will look beyond the current changes considered above, to more enduring characteristics of American society, social structure, interaction, and personality which are conducive to disclosure and discovery.

Uncontrollable Contingencies

The complexities and interdependencies of modern life, which too often thwart efforts at rational planning and intervention for the public good, may also thwart conspiracies. Former spy and novelist John Le Carre, in speaking about intelligence operations, finds it "difficult to dramatize the persistent quality of human incompetence. I don't believe that it's ever possible to operate such a clear conspiracy (as in his novel The Spy Who Came in From the Cold) where all the pieces fit together." 9 While this may overstate the case, there is an element of indeterminacy in human actions which often works in favor of disclosure. Folklore and literary treatments such as Cervantes's "murder will out" and Shakespeare's "by indirections find directions out" capture elements of this.

Those involved with dirty data may face exposure or suspicion due to factors beyond their control. Failures, accidents, mistakes, coincidences, victims, fall-out, remnants, and residues can all offer indications of dirty data close at hand. Strictly speaking, these offer an opportunity rather than a strategy, for data collection. The strategic elements emerge in the varying degrees of skill required to ferret them out. The event is also distinct from data collected about it which, of necessity, must be selective.

Some uncontrollable contingencies are "merely personal." Thus we learn that Richard Pryor, as a result of a fire in his home, was probably using cocaine, or that the late Governor Rockefeller, when he died in the company of a young assistant, may have been having an affair. But others are keys to organizational deviance and problems. The death of Dorothy Hunt in a plane crash, with thousands of dollars, tells us about Watergate hush money; miscarriages and infertility in Oregon and upstate New York reveal hazards of pesticides and industrial waste; the oil spill in Santa Barbara points to collusive relationships between the oil industry, academics, and government (Molotch and Lester, 1974); the mishap at Three Mile Island shows the failure of equipment and regulatory policies; the fire at the Las Vegas MGM Grand Hotel exposes fire and building code violations; the deterioration of a new building at the University of Massachusetts tells us about fraud in the awarding of construction contracts.

Traces or residue elements are separate from accidents and mistakes, and perhaps surer sources, in that for certain types of infractions they will always be present. 10 The difficulty is, of course, knowing how to identify and interpret them. Some trace elements are manifest and available to anyone: a missing person, signs of forced entry, missing documents, gaps on a tape, or red dye on money, clothing, or skin (from a canister slipped between currency that exploded shortly after a bank robbery). Other trace elements are latent: powder that can be seen only under ultraviolet light, fingerprints, electronic impulses, inadequacies of counterfeit money, or documents. These require special skills to discover. Many electronic surveillance devices emit signals which can be read. Most complex illegal undertakings will leave physical clues, whether fingerprints or the paper trail of laundered money. Some investigators have received notoriety for literally sifting through the garbage, looking for telltale signs. Instructional manuals and training materials are another trace element. Thus, the Supreme Court in the Miranda decision drew upon such material as evidence of police violations of Constitutional protections.

Trace elements involving victims are likely to become publicly known to the extent that (a) the gap between victimization and its discovery is short, (b) the victim is personally identifiable, (c) the victim is aware of the victimization, and (d) does not fear retaliation for telling others about it. There is a parallel here to the ease of discovering victim as against victimless crimes. The former are much more likely to be known about.

Trace elements, of course, need not be physical. One clue to possible dirty data lies in an organization's internal rules and policies, external laws concerning it, and professional codes of ethics. As Durkheim suggested, their presence is often a sign that members will face temptations to behave contrariwise. Their presence is also a clue to the presence of pressures toward social control corruption. Sociologists can draw upon their knowledge of organizations for clues to where dirty data is likely to be found. Certain structural and cultural characteristics can serve as likely barometers of dirty data. The developing literature on organizational deviance contains many clues (Needleman and Needleman, 1979; Sherman, 1980; Finney and Lesieur, 1982). Elements of folk culture such as humor, slang, nick-names, graffiti, and gossip can also offer clues.

Volition

This is a broad category. Whistleblowers, informants, and overt participant and non-participant observers share in the willing provision of discrediting information.

Given the prestige of scientific research, it is not surprising that many persons are willing to participate in large-scale anonymous self-report studies regarding their criminal or sexual behavior. Many researchers have followed in the path of Wallerstein and Wyle (1948) and Kinsey et al. (1948) in using this method. More surprising is the extent to which dirty data revelations can come forth without anonymity. Interviewers and observation are the major source for such data. Memoirs, biographies, letters, and other personal documents are important and under-used sources. Social and psychological factors can be conducive to revealing secrets.

Accounts of fieldwork often suggest that, once rapport and trust are established, people are often only too willing to talk (e.g., aside from the legions of investigative reporters, social researchers such Polsky, 1967; lanni and lanni, 1972; Chambliss, 1978; Klockars, 1974; Galliher, 1980; and Millman, 1977). Primary group relations are partly based on sharing information. This can be a means of expressing solidarity. Informants may wish to help. They may have a desire to be understood and to explain their actions. There may be pride in their technical skills for which recognition and aesthetic appreciation is sought. They may feel a need to justify their involvement with dirty data, have a Dostoyevskian compulsion to tell, or enjoy the sense of power noted by Simmel (1950) that comes from sharing secrets. Isolated from opposing definitions and confident in their actions, they may be open because they do not see their behavior as discreditable. Insiders can also be used as researchers to give otherwise unavailable access, e.g., Walker and Lidz's (1977) use of Black street addicts.

The fact that there is frequently a lack of congruence between individual and organizational goals can also be conducive to the revelation of secrets. Information is a resource, just like income or authority. It can be used to damage rivals or traded to enhance one's own position. Money and offers of immunity, or other help, can often buy information. There may also be hope of great riches and fame (and perhaps, for the lucky few, a movie or TV series) from writing a book about one's secret activities. In the first six months of 1980 alone, the CIA reviewed 22 manuscripts by former CIA agents. 11

Whistleblowing is a dramatic form which has increased in the last decade (Westin and Salisbury, 1980; Dudar, 1979; Bok, 1980; Nader et al., 1972; Peter and Taylor, 1972; Government Accountability Project, 1977). New laws and policies have attempted to encourage and protect it. For example, the 1977 Toxic Substances Control Act requires that employees and officials of chemical firms be instructed about their legal obligation to report chemicals posing substantial health or environmental risks. Here, not to whistleblow becomes illegal. In many government agencies, employees must report bribe offers.

New anonymous tip and complaint receiving mechanisms make it easier to whistleblow or report improper behavior. These vary from 911 mail boxes in some cities for reports of police abuse (police themselves are often major users), TIP (turn in a pusher) programs, and toll free numbers where violations can be reported to government agencies. The first and best known of the latter is that of the General Accounting Office. It has received about fifty calls a day since it was established in 1979. Following its success, the Office of Management and the Budget has required certain federal agencies to establish such lines. While the initial complaints are nor made public, they may result in court cases which are public.

Some whistleblowing comes from highly idealistic persons who are shocked by what they see in the day-to-day operation of their agency. This type of whistleblowing is likely to be common in a society such as the United States, with Puritan roots and a highly moralistic political and cultural style. In instances where the occupation attracts idealistic people and where the gap between ideal standards and actual practices is large, whistleblowing is more likely. This seems to be the case for some whistleblowing within law enforcement (e.g., Wall, 1972). Conflicts between the highly educated professional's sense of expertise and bureaucratic and political realities in large organizations is another source of whistleblowing. As both the need for professionals and bureaucratization increase, so too may whistleblowing (Perrucci et. al., 1980).

Beyond generating data about which trumpets, let alone whistles, could be blown, organizations also may generate personal motives. Complex organizations do not reward people equally. Some persons are likely to be angered over blocked mobility or rewards they see as insufficient.

Whistleblowing involves a conflict between the person who would tell and the organization (or at least its dominant leaders) who wish to keep things quiet. More subtle and more common is the leak, where the release of discrediting information is a device for serving some other organizational purpose. There is also a category of "give-aways," where persons do not realize they are revealing dirty data. This may be because of its highly technical nature, because its providers don't know that you have other data with which you can match it, or neutral data may become dirty in time, as new developments such as accidents, illness, or environmental spoliation become manifest and lead to reassessment and reinterpretations. 12

Some dirty data appears at the intersection of several pieces of conventional data which may be easily available. This can be the case with research on the concentration of economic and political power (e.g., Domhoff, 1979; Useem, 1980). Other examples can be seen in the case of people processing institutions (e.g., commitment to mental hospitals) whose formal records describing a person's behavior may conflict with accounts of friends and family and coworkers. When these are juxtaposed and in contradiction, the researcher may wonder about the agency's actions.

The fact that dirty data is often available should not cause us to miss the point that it is less available than clean data. We are dealing with conflict relations and the need for secrecy. The active investigator need not wait for accidents to occur, or whistleblowers to come forward. He or she can also take the initiative. More subject to the control of the researcher are methods involving deception and coercion. These assume an adversarial or conflict model.

Deception

The use of deception is familiar in social science research, particularly that of a social psychological nature. However, it has been used far less to gather dirty data than for other reasons (e.g., to gain access to worlds normally denied the researcher, to gain data on matters kept private, to not bias responses by telling people what is being studied or that a study is being done, and to manipulate the participant's situation in accordance with notions of causal variables, or, as with candid camera, merely to see what happens). There is a sizeable literature on deception in traditional social psychological experiments on covert participation observations and on information collected under false pretenses (Keyman, 1977; Hilbert, 1980; Humphreys, 1975; Warwick, 1975). But there has been little discussion of the role (and power) of deception as an information gathering strategy involving "reality experiments" in dirty data contexts.

Here the logic of the experiment may differ from its more conventional social science usage involving control and experimental groups. There may be no control group, since the goal is empirical description rather than an effort to test causal theory. More refined inquiries, often at a later point in time, or research where the dirty data lies in documenting a pattern of differential treatment by race, sex, offender status, etc., will have the traditional control group.

After a long, relatively dormant period, muckraking journalism has shown increased vigor. Contemporary investigative journalism, whether newspapers such as the Chicago Sun Times, or television programs such as "60 Minutes" or "20/20," offer many examples. The Sun Times has been a leader in use of the technique. For example in an investigation of insurance fraud, it had its investigators pose as victims of minor accidents with whiplash. They were then led down the road to hospitalization and insurance fraud by unwitting lawyers, doctors and hospital staff. Chiropractors came to treat "injuries" created by lawyers. In some cases, expenses of $40,000 were generated, treating essentially healthy investigators. In another case, reporters opened a Chicago bar and documented requests from various government agents for bribes.

Social reform groups have used the tactic to document and publicize problems. For example, the Chicago Better Government Association's inquiry into voter fraud had two investigators assume the life of winos and move into a skid row flop house. They registered under names such as James Joyce and Ernest Hemingway. When the voter lists turned up a short time later, Joyce and Hemingway were on them and actually voted! (Los Angeles Times, Oct. 9, 1977).

There are a great many law enforcement examples. Some of the more elaborate involve police posing as fences, pornographers, or sheiks bearing bribes as in ABSCAM.

The social research literature shows a smattering of deception used in this fashion. Schwartz and Skolnick (1962), in seeking to study the effect of a criminal court record on employment opportunities, had an "employment agent" visit a sample of one hundred employers. The agent presented one of four fictitious employment folders. The folders were exactly the same, with the exception of the criminal record (this varied from no record, to acquitted, to convicted and sentenced). As the seriousness of the record increased, chances for employment decreased; even those who were acquitted had much less chance of being hired. In Los Angeles, Heussenstamm (1971) sought to study claims of police harassment of the Black Panthers. Fifteen "typical" students with "exemplary" driving records agreed to put Black Panther party bumper stickers on their cars. The cars were in good condition and the students drove as they normally did. They received 33 citations in 17 days, and the study had to be stopped. Phil Zimbardo (1969) left what appeared to be abandoned cars on New York City and California thoroughfares and watched to see if they would be dismantled.

The National Wiretap Commission's interest in the availability of illegal wiretaps led its investigators to call 115 randomly chosen private detective agencies in seven large cities. They identified themselves as businessmen interested in tapping a rival's phones. In more than a third of the cases, the agencies contacted offered to install the illegal taps; many that refused offered "to show the callers how to do it themselves" 13 (O'Toole, 1978:75).

Selltiz (1955) reports an early experiment that used matched pairs of Black and White diners to assess discrimination in restaurants. HUD has used an equivalent tactic in studying housing discrimination. Black and White auditors who were otherwise similar responded separately to the same rental-sale opportunities. They found Blacks were systematically treated less favorably and courteously than Whites. 14 The Supreme Court in a 1982 decision ruled that a "tester" who is misled has standing to maintain a claim for damages under the 1968 Fair Housing Act.

Jesilow and O'Brien (1980), in a study of deterrence and automobile repair fraud, had women approach randomly chosen garages with the story that they were changing residences and their cars would not start. They told the repairmen that the car battery was in the trunk of the borrowed car they were driving, and requested the battery be tested. The battery, of course, was fine. Depending on the group sampled (experimental, control, pre- or posttest), from five to twenty percent of the time a new battery was recommended.

Pontell et al. (1980) has proposed using patients with prediagnosed common ailments to study fraud in government-funded medical benefit programs. The method involves using patients with equivalent symptoms. Some of the patients are entitled to health care benefits. All are to pay out of their own pockets for services rendered after each visit. The quality of treatment received and suggested would then be rated by a panel of doctors, and records of the insuring agencies checked to see if double-billing occurs. In a related case, investigators (including a U.S. senator) for a Senate Subcommittee on Long-Term Care (1976) visited medical clinics. They complained of colds and other minor symptoms. The inquiry documented numerous examples of fraud, and inferior and unnecessary care.

While not dirty data in the sense of the examples consider here, which have a willful quality, experiments designed to test the quality of professional diagnosis and skills may make similar use of deception. For example, Rosenhan (1973) discovered that health "pseudo-patients" who checked themselves into mental hospitals with vague symptoms could be diagnosed as schizophrenic.

Infiltration or covert participant observation is another form of deception. Its main domestic users are police, industrial espionage agents, and occasionally journalists, activists, and researchers. It involves fitting into some ongoing set of activities rather than generating new organizations and activities. To the extent that the infiltrator takes a more active role, trying to consciously influence what happens, the method has some of the quality of an experiment. At one extreme, the observer functions like covert electronic surveillance, merely transmitting what is going on; at the other, is the researcher as agent provocateur. 15

Coercion

This includes a variety of means that share compulsion as the essential mechanism. People or organizations are required to furnish information under threat of various penalties ranging from imprisonment, fines, revocation of license, or withholding of goods, services, or privileges desired by the possessor of the information. Through laws, courts, and policies, government, with its power to coerce, is the major source of institutionalized discovery practices.

Investigations and hearings by Congressional, presidential, state and local commissions, and various government agencies such as the General Accounting Office and Office of Management and Budget with subpoena powers are major sources of information. In the case of historical data, most files are open to researchers after 50 years. Annual reports and reports of inspectors general can provide rich data. The Freedom of Information Act is particularly relevant to the needs of researchers, though in most states there is nothing like it at the local level. Reiss and Biderman (1980) have compiled a list of federal data sources available for the study of white collar crime.

Court records including indictments, testimony, and evidence can offer valuable information. However, one must know where to look and have resources for generating a transcript from the court record. If the case is not appealed, a transcript will be unavailable. New federal discovery rules require prosecutors to make available to the defense information in their possession relevant to the case. Grand jury data, which in general is kept secret, can be very powerful. Another good source of data lies in the routine reports that many occupational groups are required to file, e.g., for doctors (prescriptions, tissue samples), for police (bribery attempts, use of force, weapons discharge), and for those in congress (campaign contributions, conflict of interest).

Lawyers and criminal investigators may use coercive confrontation tactics to gather information. These vary from threats to subpoena, arrest, or sue, to blackmail and the use of force. (While less prevalent now in the U.S. than previously, the third degree is one such technique.)

Brief mention can also be made of a nonconventional data gathering technique which falls between the above methods: ESP. This method is apparently taken quite seriously by Eastern European and Israeli intelligence agencies, who are engaged in extensive research on it (Deacon, 1977). Police departments also have experimented with the use of psychics. While it raises interesting issues for validity, ethics, and privacy, were it to be used in social research, it is possible that, in the future, ESP may emerge as still another form of gathering dirty data. What type of technique is it? It might be classified as coercion because it is against, or independent of, a person's will; as deception because it is carried out covertly; or as an uncontrollable contingency, because of the residue or traces given off by thought and behavior which the expert draws on. It is also interesting to speculate on what form counterespionage might take.

Protecting Dirty Data

The would-be discoverer of dirty data must ask, how do organizations attempt to protect their information? The other methods of discovery we have considered have counterparts in actions taken to protect secrets.

Organizations may attempt to limit the damage from accidental or coincidental discoveries by diffusing and hiding responsibility, by having "need to know" rules (even for those who are highly trusted), by compartmentalizing activities, by using code names and a cell organizational structure, by delegating dirty work in a nontraceable way, by having mechanisms which insulate higher status persons from traceable "contamination," by eliminating witnesses, and by having contingency cover-up plans. Paper shredders and refuse burned under guard are means of thwarting garbage detectives.

Efforts to avoid informing and whistleblowing can be seen in recruitment, socialization, and sanctioning patterns. Background investigations are one means. Recruiting on the basis of ethnicity, or friends or relatives, is also thought to increase reliability. Kinship was a prominent device in underground networks in World War II and is a factor in some criminal enterprises. Loyalty may be cultivated by good working conditions and rewards, as well as appeals to shared values. In a kind of institutionalized blackmail, culpability may be built in by requiring or maneuvering employees into participation in illegal or potentially discrediting actions. To indict the organization would thus be to indict oneself. Long training, testing periods, and, as employees come to prove their reliability, gradual exposure to an organization's secrets are other devices. Contracts binding employees to secrecy and subjecting them to civil and criminal penalties for divulging information are also relevant. The CIA, for example, claims the right to censor what its former employees publish and, under some conditions, can stop them from getting royalties. Violence against informants, stigmatizing the tattletale, psychiatric labeling, dismissal, the loss of a pension, and blackballing are other devices intended to deter the sharing of secrets.

Awareness of deception and institutionalized coercion as information gathering tactics may give rise to a variety of strategic actions designed to mislead and limit what can be discovered. Thus, the deception of the investigator may be matched by the counterespionage of the target who gives out or permits false data to be discovered. The requirement that reports be filed or testimony given does not insure that they will be accurate. Awareness of legitimate and illegitimate electronic surveillance may mean electronic countersweeping, or restricting key communications to areas where bugs are unlikely. Communications can be faked, guarded, and disguised when a wiretap or bug has been discovered. There is a dialectic between discoverers and keepers of secrets, as they reciprocally adjust their behavior. A discoverer's advantage may be only temporary, or work best on amateurs who have not found a form of neutralization.

Contrasting the Methods

We have considered four general methods by which researchers can obtain hidden and dirty data. 16 How do these methods compare with respect to criteria such as ethics, representativeness, reactivity, susceptibility to the researcher's initiative, range of topics covered, validity, skill requirements and costs? Table 3 contrasts the methods with respect to these criteria. Without going over every cell in the table, let us highlight some of the major characteristics of these methods.

The major advantage of the deceptive experiment is, of course, its incredible power to pierce the protective shield of secrecy that is likely to surround those involved with dirty data. It yields primary data, is subject to the researcher's initiative, and permits testing hypotheses. With adequate resources, one can take a sample of appropriate settings or subjects and hence make a case for its representativeness (though of just what can be problematic, as will be noted).

Replication and a degree of control over the variables involved are possible (although with field experiments such control is always limited). Skill requirements are moderate, although imagination is needed to develop the precise tactics, and skill in acting may be required.

The major disadvantage of the method involves questions of ethics and the meaning of the data. There are potentially corrosive and problematic aspects of using such tactics. Deception involves important ethical issues such as lying, invasions of privacy, manipulation, and involving subjects without their consent. In getting at the dirt, one may get dirty oneself. Seeking data on illegal actions may draw the researcher into illegal activities, and he or she may face temptations not usually considered in graduate methodology classes.

For reasons of resources and ethics, a rather narrow range of issues has been covered using the deceptive experiment. For social researchers, it is best used in attempting to document a pattern of victimization of persons that involves specific actions at one point in time. The dirty data appears as a result of actions taken by the target relative to a subject, e.g., fake voter registration, discrimination in housing, employment, or law enforcement, or consumer fraud. The dirt can lie in a clear violation of laws or policies, or in a more subtle misuse of discretion. The researcher presents him or herself as a client, patient, stooge, or ally, and sees if the hypothesized behavior is, or appears to be, forthcoming. There are limits here, as the researcher is unlikely to wish to take this to a point where actual damage is done, or the law is violated, as with unnecessary surgery, actually paying a bribe, or purchasing contraband.

It is much easier to become involved in an ongoing setting at one point in time than to create an entirely new setting. Elaborate hoaxes, such as fake criminal enterprises run by police, generally go far beyond the resources and competencies of the researcher. The cost, skill requirements, and risks of discovery increases, the more complex the deception and the longer it goes on.

The data from deceptive experiments can be questioned with respect to both their validity and generalizability. One source of error lies in the strategic actions of those with secrets to protect. They may discover the experiment and take what amounts to counter espionage actions, giving out deceptive information themselves. Thus a finding of no dirty data must always be considered in light of the question, "was the target suspicious and hence behaving in an atypical way?" A second source of error lies in reactivity or entrapment, as subjects respond to subtle or obvious pressure from the researchers. The degree of passivity is an important factor here. Does the researcher merely offer an opportunity, or go beyond this to actively encourage the subject to use it?

Even if the results are valid, the intervention may have an artificial quality, making it unclear what it should be generalized to. As with any experimental undertaking, even those done in natural settings, there is the question of whether it is representative of real-world settings or merely other experimental settings. For example, this seems to be the problem with discovering that persons will respond criminally to deceptively provided opportunities that are appreciably more tempting than those likely to be found in reality, or discovering (using Black and White investigators) that there is housing discrimination against Blacks in elite White areas in which few Blacks may wish, or be in a position, to live in. An important factor here (and one recognized by the courts in criminal cases) is how closely the artificial setting corresponds to those in the real world.

Deception in the case of infiltration shares many of the advantages and disadvantages of the deceptive experiment, yet some differences can be noted. Since it goes on for a longer period of time and is more diffuse, the range of topics or issues covered is likely to be much broader than with experiments.

The risks, ethical issues, and temptations faced are likely to be far more serious than with the onetime experiment, especially those where the researcher is offered as potential victim. Joint illegal actions may be demanded as the price of access, discovery of one's true identity can lead to physical harm, and the researcher with knowledge of others' wrongdoing can be compelled to testify. The researcher does not have the immunity or right to confidentiality that law enforcement, medical, and religious professionals have.

Since there is usually only one covert participant, cross-observer data cannot be a source of validation. Representativeness is also likely to be an issue, since access is often a matter of idiosyncratic local factors related to previous work experience or friendship patterns.

Data obtained through the various means of institutionalized coercion are unlikely to be otherwise available. In general, they do not raise profound ethical issues, are relatively inexpensive (generally involving the cost of photocopying, though where elaborate FOIA searches or the transcription of court records are required, expenses can be large). The data may be available on a scale far beyond what the researcher could normally obtain.

However, the researcher has less initiative, and replication and control of relevant variables are rarely possible. Merely locating the information can be a problem. Court data are not centrally located nor indexed in ways that benefit the researcher. Even with FOIA requests, the researcher needs to know what he or she wants. Because such data is initially gathered for purposes other than research, it tends to have the usual drawbacks of secondary data. This sometimes is avoided by the researcher being able to advise commissions, committees, legislative bodies, and courts as to exactly what data should be gathered. With regularized reporting requirements that apply categorically, representativeness is not an issue. However, with grand jury or court proceedings, issues of representativeness may loom large. Cases for which data is available may represent particularly grievous instances, or idiosyncratic prosecutorial factors.

The validity of data gathered under coercion must be carefully scrutinized, given a conflict setting and the likelihood of strategic responses. In the case of court testimony, those facing criminal charges may have a strong incentive to lie, particularly if they are offered immunity. With respect to records, false or misleading reports may be filed. The former may be a natural response for agencies (e.g., national security or law enforcement) professionally involved in dissimilitude. Awareness that one's documents are subject to the Freedom of Information Act may mean that far less is written down, or that written records are destroyed as soon as possible 17 (thus, records prior to the 1966 passage of FOIA, or the stronger version in 1974 following Watergate, are likely to have the greatest validity). The concept of "maximum deniability" that surfaced during the Congressional Hearings on U. S. Intelligence activities in the 1970s certainly predates reporting requirements and the FOIA. Even where things are written down, observer should be on guard. As a federal employee with extensive experience in the handling of the kinds of records researchers seek from government observes, "the 'cooking [falsifying] of records' has gone on as long as there has been any government anywhere" (The New York Times, July 11, 1982)

As a means of misleading their competition, businesses may patent their failures. Through tricks and subterfuge, reports filed may be technically, but not actually, correct. Thus, assets or conflicts of interest may be hidden. Silent partners may not be listed. Those listed as owners may merely be fronting for others. Bribe offers and payments may be made indirectly through lawyers. The most serious offenders may never file reports (for example, some businesses avoid licensing and reporting requirements by continually changing their name and place of business).

Information, which is given voluntarily, avoids many of the ethical issues around deception and coercion. To take the case of the whistle-blower, the ethical issues fall primarily on the bearer of the data, and much less on the researcher who uses it. Since the whistle-blower tends to come forward on his or her own, reactivity is not likely to be an issue. The range of topics covered is likely to be broad, the method is inexpensive, and it does not require a highly skilled researcher to use it.

Yet it clearly has disadvantages. As in the accident, the researcher's role is rather passive. The researcher usually must wait for the whistle-blower, though good field workers can sometimes gain equivalent information from their informants. Problems of representativeness are often present. It is difficult to know what the account of the whistle-blower represents. The rarity of whistle-blowing can be used to argue that the very atypicality of the case was what generated whistle-blowing in the first place. Great care must be taken with respect to validity. In coming from the horse's mouth, whistle-blowing can have a persuasiveness which anonymous, or outside, data sources lack. The personal motives of the whistle-blower can lead to distortions, exaggerations, and outright falsification. Skepticism and a critical attitude are necessary in the face of seeming data gifts, whether from whistle-blowers, accidents, or informants. Are they what they appear to be? The researcher must be especially careful when the whistle-blower's account supports the researcher's own ideological stance toward the organization or issue in questions.

The uncontrolled contingency does not present serious ethical problems, though dealing with trace elements may involve violations of privacy. Since the data appears independently of the actions of the researcher, reactivity is not a problem, and it is inexpensive. Nor is validity likely to be an issue (though in highly controversial or conflictual areas there is always the possibility that a trace element or accident is not what it appears to be and was created to cover up something else, to damage someone, or to pursue self-interests). The researcher must always ask, was the accident faked, the evidence planted, the dirty artifacts, or data, counterfeited or contrived?

Unlike the other methods, the uncontrolled contingency is not a strategy that the investigator can initiate: replication and control of extraneous variables are rarely possible. Rather, it is an opportunity, which the researcher reacts to, although the investigator is aided by knowing where to look.

The occurrence of an accident is, of course, no guarantee that dirty data is present, nor, if it is, that it will become available. Its occurrence may be covered up and access to data about it denied. 18 The discovery of cover-ups raises the intriguing question of the ratio of discovered to successful coverups. Is success typical and discovery primarily due to incompetence or bad luck? Even if this is not the case, issues of representativeness plague generalizations from accidents. It is difficult to know what they are representative of.

When data are available, there may be major problems with respect to their meaning and interpretation. Seeing smoke does not necessarily tell you what kind of fire is present, nor what caused it, nor how to put it out. Facts do not speak for themselves, though some seem to whisper louder than others. This is part of a more general issue in social problems research. It is much easier to document a problem than to explain it.

Accidents can be a key to discovering that dirty data is present. But the collection of data is likely to depend on further actions. The publicity around an accident can mobilize resources and political support to set coercive data collection procedures (courts, grand juries, commissions) in motion.

Some Research Issues And Needs

My purpose in this paper has been to call attention to a type of data with particular relevance to social problems research and to contrast some methods for obtaining it. Substantively, the topic of dirty data touches many areas, including the sociology of knowledge and science, secrecy, stratification, face-to-face interaction, mass communications, and deviance and social control. It suggests a number of researchable questions:

From a sociology of knowledge perspective, what are the implications of the differential availability of information? With respect to both popular consciousness and scientific knowledge, what are the areas where we have valid information (whether as a result of systematic social research or personal experience), as against areas where information is invalid or lacking? To what extent is our assessment of the contours of information validity correct?
How is our image of the world distorted because facts are hidden or deceptively presented? What are the implications of our greater weakness as scientists or interveners in the face of hidden and dirty data?
How does this type of ignorance (withheld or deceptively presented information) compare in form, process, and consequences to ignorance involving errors of measurement, or theory, things that are knowable but undiscovered (e.g., topics that are dangerous to study), or to things that are unknowable? For example, contrast (before discovery) what was believed about the Soviet downing of Gary Powers U-2 spy plane, the Tonkin Gulf bombing, the My-Lai massacre, black lung disease, destabilization and the CIA, police and protest groups, and Watergate with what were, or remain scientific anomalies, e.g., meteorites and sea serpents (Westrum, 1978), or individual or mass delusions. What does it mean that we tend to lack adequate terms for even characterizing types of empirically invalid beliefs?
Does the greater difficulty in studying dirty data topics (including method limitations) mean we will always have to give greater reliance to the accounts of novelists or journalists, and will these topics always be more contentious because the data are poorer?
Is there a knowledge product (characteristic of much social problems research) which stands between the novelist, journalist, or detective, and the pure scientist, while drawing from both groups? Such knowledge shares with the former the need to rely on non-rigorous and questionable data sources, the desire to raise issues, sensitize the public, and document problems, and a frequent reliance on individual cases. It shares with the latter respect for the logic of explanation and the need for empirical verification and generalization.
How do dirty data secrets differ from other types of secrets?
How is dirty data distributed in the society?
How do (1) its extensiveness, (2) its form, (3) the ability to protect it, and (4) its consequences relate to social stratification? The presence of dirty data is certainly not restricted to elites, though the social costs of their dirty data and their ability to prevent discovery, are likely much greater than is the case for subordinant groups. What institutional areas or organizations produce the most, e.g., contrast commerce with art, or politics with science (though recent revelations suggest that deception in science is far more widespread than is generally acknowledged). Is an organized adversary present who stands to be harmed by concealment? If so, what are the implications of this?19
How is access and accountability for discrediting information related to stratification? On the one hand, higher status people may give the orders, but on the other, they are likely to be insulated from responsibility. Thus, middle or lower level personnel are more likely to be sanctioned. In addition, certain types of dirty-work are routinely given to lower status members and serve to reenforce their status. Where physical actions are involved, workers close to the scene are likely to know what is going on. Butlers, servants, and maids have historically been the first to know, and are in a good position to be spies and blackmailers 20
How conducive is "insulation from observability" (to use Rose Coser's [1961] long neglected phrase) to rule violations? Put the other way, is sunlight always the best remedy, as Justice Brandeis suggested?
What is the connection between keeping dirty data secret and legitimacy? A central element in sustaining societal myths (beyond staying in power) may be concealing dirty data. On the other hand, openness (e.g., Jimmy Carter's "I will not lie to you" speech) can be a device for enhancing legitimacy.
What unintended consequences may accompany the recent efforts to increase openness? For example, the major users of the FOIA are businesses trying to learn about government regulatory practices and those in prison trying to find out who informed. Do open meetings mean less candor and more "premeeting meetings"? Does living in a goldfish bowl mean more and better concealed secrets, as Bennis (1980) argues? Are there conditions where "ignorance is bliss" and "what you don't know, won't hurt you" (Moore and Tumin, 1949)?
What are the consequences of making it easy to file anonymous complaints? Similar issues are raised with "shield laws" offering immunity to journalists who refuse to reveal their sources. On the one hand, it could be argued that not having the identity of the informant known will produce important and otherwise unavailable information. On the other hand, the very anonymity can stimulate invalid data and trivia, overloading the system, slandering reputations and wreaking havoc, as those who are vindictive, ruthless, irresponsible, cranks, saboteurs or witch hunters take advantage of the protection it offers. Important issues are the ease with which allegations can be kept secret before and (should they prove unfounded) after efforts at verification. Are problems reduced when the identity of the information source is known, but not made public, or is known by trusted intermediaries who are not formally a part of the organization interested in the information? To what extent are problems reduced with a two-step system, where the identity of the complainant is known by certain officials but not made public?
How do types of society compare? How does a society's degree of openness affect the amount of dirt that there is? When it is relatively easy to get, as in the U.S., do people become indignant and demand change, or cynical and tolerant because "it's everywhere"? Under what conditions might publicizing it serve to stimulate the misbehavior of others through a forbidden fruit effect?
How does it relate to scandal and social reform?
How are broad societal changes, whether social, cultural, or technical, affecting the ability to protect and discover dirty data? Are we becoming a more open society? Is the relative amount of dirty data declining or should it best be seen as a moving equilibrium?

Beyond substantive issues, awareness of the topic has practical implications. Social researchers appear to make much less use of dirty data discovery methods than do journalists, detectives, and even historians. This partly reflects differences in goals, resources, and norms. Table 4 [not shown here; author will be glad to send or show original] contrasts types of secrecy investigator by goals (scientific understanding, news, social reforms, planning a counterstrategy, prosecution, and resources for coercion).21

Unlike the social researcher, most other secrecy investigators are themselves protected by secrecy, and need publicly document neither their sources nor their methods. These may be protected legally and by professional standards. Indeed, even results may be kept secret or used only as needed. But, given the demands of scientific communication, the researcher is expected to go public. He or she must describe where the data comes from and how it was collected. The standards of evidence for making a scientific claim are higher than for journalism or the law. In the case of journalism, for example, issues of causality, methodological failings, and representativeness generally receive little attention. Social researchers also differ from many government agents in that they cannot offer large sums for information, put together a grand jury, issue a subpoena, compel testimony, offer immunity, nor legally wiretap. 22 The closest we can come is occasionally advising government bodies, as with special commissions or courts that can do these things. Whether considering government agents, or journalists, our resources are more limited and our standards in general are more restrictive with respect to using coercion and deception. 23 Academic norms of civility and gentility operate against some of the more roughshod methods of others in the dirty data discovery business. 24

But beyond these differences, I think social researchers also make less use of these techniques as a result of lack of awareness and familiarity, and perhaps a generally less skeptical and cynical occupational world view than is the case for police, lawyers, and investigative journalists.

There is a small literature on dirty data research issues, particularly in considerations of fieldwork methods (e.g., Social Problems, Feb. 1980), social psychological experiments, and in the substantive areas of political sociology (around the study of power elites) and deviance-criminology. But this literature is restricted and has not dealt adequately with the new issues and opportunities of the last decade. Our methodology textbooks tend to be sadly lacking in guidelines for such research. We can learn a considerable amount from those professionals who routinely seek to discover dirty data. Their results may offer rich materials for secondary analysis. We might also make greater use of their methods for primary analysis, as some historians have done using the FOIA.

The array of methods we use could be broadened. Methodology texts and course should give more attention to these methods and resources for obtaining dirty data. We should become as familiar with the University of Missouri's CIRE (Center for Investigative Reporters and Editors), as with the University of Chicago's NORC. Students should be taught how to use the Freedom of Information Act (e.g., see Committee on Government Operations, 1977; Center for National Security Studies, 1979), just as they are taught how to draw a sample. We should communicate to our students the joy of discovery and also what a discovery motion in court is. Berman Associates' checklist of Congressional Hearings, and the monthly list of Government Accounting Office reports, should become standard references. We should scan the periodic lists offered by the Nader-sponsored Freedom of Information Clearing House, just as we scan listings of government and foundation grants. The Carrolton Declassified Documents Reference Service should become as well known to us as the Yale Human relations area files. Just as some of us have learned how to contact the Census for demographic data, we should learn how to contact the Clerk of a court for legal documents. Civil and Criminal Court indices should be as well known as those at the Roper Center or the Inter-University Consortium.

While researchers sometimes consult "Who's Who?", they should also be consulting an obscure, formerly classified, publication put together by CIA office of security specialist Harry J. Murphy, entitled, "Where's What-Sources of Information for Federal Investigators." This is a marvelously rich compendium of sources of personal information such as private directories and government files. Methodological stalwarts, such as Lazarsfeld and Hyman, must make room for outsiders such as Mollenhoff's Investigative Reporting (1980) and Williams' "Investigative Reporting and Editing" (1977). First Principles, Mother Jones, and 7 Days should take their place alongside the more established academic journals whose contents we skim.

While I do not suggest that we learn to wire tap, if wire tap data presents itself, we should not necessarily ignore it. Furthermore, we should know where to look to find out if it is presenting itself. As Horatio Alger noted, the knock of opportunity does little for those who do not hear it. The ethical problem here is somewhat similar to that raised by whether or not a university or church should accept tainted money (as from a slum lord).

While the ethical issues are not to be taken lightly, and will limit us relative to actions that may be justified in war time, or even those routinely taken by police, I think sociologists can go further and be more imaginative than we have been in the kinds of natural field experiments we attempt. There is a need for standards and discussion in this area to be sure (Keyman, 1977; Klockars and O'Connor, 1979). However, perhaps different standards with respect to deception, privacy, informed consent, and avoiding harm to research subjects ought to apply when the subjects themselves are engaged in deceitful, coercive, and illegal activities, and/or where one is dealing with an institution which is publicly accountable.25 Even without resorting to ethically questionable methods, an astounding amount can be discovered through intelligence, knowing where to look and what to look for, diligence, and the cultivation of sources. The career of I.F. Stone, with its heavy reliance on congressional hearings, attests to this. Preferring publicly available information, and without resorting to deception, he has been a one person discover machine.

Many of the topics dear to the hearts of social problems researchers could be better illuminated were we to make greater, through restrained, use of methods for discovering dirty data. Yet the researcher in this area must judiciously walk a hazy line between the unacceptable extremes of taking the world at face value and believing that what is unseen is unimportant, as against thinking that nothing is what it appears to be and that whatever is hidden must, therefore, be significant. The presence of secrecy is a guarantee of neither theoretical nor social relevance. Even where dirty data is scientifically and socially relevant, respect for the law and individuals' rights must be carefully balanced against the scholar's concern with discovering the truth and contributing to reform. There are many instances where the former will preclude the latter. In spite of such concerns, increased attention to dirty data methods, topics, and issues is one factor required for better understanding of social problems.

Back to Home Page | Top | Notes | Tables

Notes

In noting its occupational ubiquity, Hughes observes, "It is hard to imagine an occupation in which one does not appear, in certain repeated contingencies, to be practically compelled to play a role of which he thinks he ought to be a little ashamed morally" (1971: 343). There is of course an important element of cultural relativism with respect to specifics. Thus kickbacks in the U. S. are hidden and illegal, while in some non-Western countries they are simply seen as a regular part of business overhead. In this chapter we assume that the researcher and target of the research share the same definition of what constitutes dirty data (or at least share the view that the broader culture will see the behavior in question as dirty).
Among factors that confound the expected dirty data-secrecy relationship: (a) people whose power comes from the ability to coerce or threaten others must occasionally make good on the threat, and this must be publicized to the relevant target; (b) situations where persons have adequate power that precludes their having to worry much about covering up dirty deeds such as the ruthless dictator; (c) situations that call for selective exposure of the action in order to attract clients or coconspirators (e.g., vice operatives need to let customers know about their services); (d) situations where generators of dirty data have immunity from sanctioning. This may be because they help police, as with informants or because of strict rules of evidence. The distinction between procedural and substantive guilt (Skolnick, 1966) is relevant here. For example, police often know who the fences, gamblers, and prostitutes are, but arrests leading to prosecution are rare, partly because it is hard to collect legally admissible evidence; (e) Situations of normative dissensus. What is seen as discrediting by one group may not be seen that way by another group. Counter culture groups or rebellious personalities may not hide, and may even flaunt, as a matter of principle and politics, behavior or facts about self that outsiders see as discrediting. Kitsuse (1980: 3) considers the last decade's increase in the willingness of many of such persons to "1/4 declare their presence openly and without apology to claim the right of citizenship."
How data is labeled to an important extent depends on the point of view of the observer. Expectations and standards have changed, making general historical comparisons difficult. Good measures on the actual amount of dirty data and the ease of discovery are lacking. This question involves highly debatable beliefs about the idea of progress and the state of morality. With increased complexity and expanded efforts at intervention (especially on the part of government), the potential for things to go wrong, and hence the possibility for leaks or whistleblowing , may be increased.
Accessibility and the sheer amount of dirty data may of course be inversely linked. Judged from the sunshine theory of social control, openness should serve to reduce illegal and questionable actions. The conditions under which it does this, rather than simply generating clever forms of neutralization or displacement, is an important question. Just as high accessibility may reduce the amount, the amount may have been a prior factor in affecting the degree of accessibility. Thus the increased access to dirty data of recent years might be interpreted as a response to pressures for reform that have emerged in light of a vast increase in dirty data (beyond the fact that there is more dirty data available for discovery). The ratio of discovered to hidden dirty data (Types C and D of Table 1) in different conditions and settings is an issue well worthy of study.
For example, new or improved forms of bugging (in some cases using lasers), wire taps, videotaping and still photography, remote camera systems, periscopic prisms, one-way mirrors, various infrared, sensor, and tracking devices, truth serum, polygraphs, voice print and stress analysis, pen registers, ultraviolet radiation, and aircraft and satellite surveillance.
To be sure, as the recent report (1979) of the Privacy Study Commission makes clear, the new data collection is a mixed blessing with an ominous double-edged sword potential. On balance, its negative aspects may far outweigh the positive. But it is not about to go away. We should seek to limit abuses while maximizing implications for an open society.
The FOIA was passed in 1966. Following Watergate, it was amended and new protections were added against misuse of the classification system (e.g., using a "national security" classification to cover illegal behavior). I don't wish to suggest that the agencies subject to the FOIA are cheerfully conforming to the letter and spirit of the law. Beyond lobbying efforts to weaken the law, institutionalized evasion practices (and cover-ups) are certainly present. Downes' (1967) law of countercontrol certainly applies to the new openness efforts. But implementation issues aside, the FOIA is of tremendous symbolic and practical importance to researchers. It offers statutory recognition to the right of freedom of access. It involves the concept that the First Amendment includes the right to have access to ideas and information, as well as the right to communicate. A government agency which wishes to withhold information now has the burden of proving that it is entitled to do so.
Lenzner (1980: 104) observes, "Lawyers are not well trained in obtaining facts1/4 Reporters have a unique capability for getting people to talk to them." Sociologists have much to learn from both: from journalists, how to better discover what happened at particular times and places; from lawyers, how to use court records, procedures, and resources.
Judge Burt W. Griffin, an attorney for the Warren Commission, argues that given American civil liberties and the requirement of proof beyond a reasonable doubt for criminal conviction, that "1/4 it is virtually impossible to prosecute or uncover a well-conceived and well executed conspiracy." However, the occasional successful prosecution of a sophisticated conspiracy "1/4 almost always results from accidental discoveries" (Blakey and Billings, 1981: 397-398).
Webb et al. (1966: 35) observes that "physical evidence is probably the social scientist's least used source of data, yet because of its ubiquity, it holds flexible and broad gauged potential." As attention to environmental issues increase, it becomes even more important.
According to one survey, between 1945 and 1973, over one hundred memoirs were written by persons holding government positions in the area of national security (Halperin, Bureaucratic Politics and Foreign Policy, 1974, vol. 11: 317-321). A capitalist economic system and a free press, in providing both incentive and means, are very conductive to the dissemination of secrets.
Such give-away information was probably more common in earlier decades, before public relations specialists and lawyers became such prominent parts of large organizations.
Another proposal made to the National Commission on Gambling (which was not followed) was to have investigators take a large number of taxi rides in various cities and ask to be taken to card games.
This is reported in Wienk, et al., (1979). Among the first race relations researchers in this genre was LaPierre. His classic 1934 study did not document discrimination against Chinese travelers. It did, however call attention to the frequent gap between attitude and behavior. In a time when prejudice was more openly expressed, he found over ninety percent of two hundred fifty hotels and restaurants responded negatively to his questionnaire which asked if Chinese customers were welcomed. Yet in actuality visiting these businesses, he and a Chinese couple were refused service only once.

A recent trend in race relations research involves hidden manipulations and unobtrusive measures of racial response in field settings. Unobtrusive studies of race and helping, aggressive, and nonverbal behavior and find greater discrimination than would be expected from survey data (Crosby, Bromley and Saxe, 1980). Since the subjects of such research tend to be chosen from diffuse publics (a person walking on the street or shopping in a supermarket), rather than organizations thought to discriminate, the direct relevance of such studies to social problems research or amelioration is limited.

The former is rare, regardless of intentions, since the investigator can rarely be certain what impact his or her presence has on the nature of the data elicited; for example, the issues raised by three infiltrators into a small sect predicting the world's end (Festinger et al., 1956).
There are, of course, other reasons for using such methods beyond the discovery of dirty data. Webb et al. (1966) scarcely mention it in arguing for the use of unobtrusive measures such as physical traces of contrived observation. Instead, they advocate such methods because of belief in "multiple operationalism." Using several methods permits stronger conclusions and some correction for the limitations (e.g., reactivity, faulty memories) present when only one method, such as the interview, is used.
The FBI was destroying records on a massive scale until a 1980 Federal District Court ordered cessation and the development of record retention plans. After the strengthening of the FOIA, some agencies shifted from records in narrative to checklist form (e.g., yes-no).
For example, the horrifying cover-ups documented in "Paul Jacobs and the Nuclear Gang" presented on public television in 1980. With respect to denial of data, the Union of Concerned Scientists recently had to resort to the FOIA to force the Nuclear Regulatory Commission to release its secret "Nugget File." The file details 230 failures and accidents at nuclear power stations between 1966 and 1976.
While perhaps not adversarial in the conventional sense, a fascinating group to study here would be the Aviation Safety Institute. This independent watch dog organization was set up by a test pilot who had little faith in the FAA and its regulatory safety practices. It publishes a newsletter entitled "Monitor." The "cornerstone" of the organization's work is an "anonymous reporting service." Since 1973, 40,000 persons dialed its toll free phone number to report "unsafe aviation conditions or unsafe acts." Verified accounts are reported in the newsletters. It is interesting that even for something like air safety, when the costs of failure are visible and enormous, there appears to be a need for an independent information gathering organization using an anonymous reporting service.
Hughes (1958:51) observes, "it is by the garbage that the janitor judges, and, as it were, gets power over the tenants who high-hat him. Janitors know about hidden love affairs by bits of torn up letter paper; of impending financial four-flushing by the presence of many unopened letters in the waste."
Other professionals excluded from this table, such as religious and psychological counselors and physicians, also deal with secrets. However, this tends to involve personal rather than organizational data, is (at least ideally) in a helping rather than an adversarial or conflict relationship, and publication to other audiences is not a goal. There are exceptions, however, as with the requirement that health workers must report cases of venereal disease.
This is a nice example of how method shapes knowledge. What would the state of social problems knowledge be if we could do these things?
This may seem less clear around deception. But the very fact that there is a social science literature debating this indicates its sensitivity. The divergent direction in social science vs. journalism is interesting here. Investigative journalism has become well established, while social scientists seem to be moving in the other direction with concerns over human subjects research.
Social scientists have the advantage of not having to be so concerned about the validity of any specific case, since their interest is in the aggregate, in patterns, and in developing ideal types. However, this is balanced by the need to have a larger number of cases before one can address relevant audiences with authority.
We lack a clear and agreed upon moral framework which can mesh, balance, or resolve conflict among these elements. Informed consent, for example, can be seen as an equitable measure in that it offers less powerful groups the right not to be studied, a de facto right which higher status groups have tended to have all along. Yet there may be other costs in its categorical application. Duster, Matza, and Wellman (1979:140-141) observe, "in some situations, 'informed consent' may in fact impede the protection of some human subjects, for example, when the question before researchers-and the public-involves possible unethical behavior, like fraud and discrimination1/4 to mechanically apply to powerful institutions a bureaucratic rule originally meant to protect the powerless, forgets the reason behind the reform." On the other hand, who is to decide, and by what decision criteria is it appropriate to conclude that a research subject may be deceived? One standard is a kind of reverse golden rule (Marx, 1983). Here persons who violate the public trust are appropriate subjects for investigative tactics that would otherwise be inappropriate. The great Catch-22 comes with the (large?) number of cases for which it is not possible to know beforehand that violations are occurring. To exempt such persons from deceptive tactics until probable cause appears makes it unlikely that the wrongdoing will be discovered.

Back to Home Page | Top | Notes | Tables

Tables

TABLE 1: Types of Data

A. Nonsecretive and nondiscrediting data: Routinely available information.

B. Secretive and nondiscrediting data: Strategic and fraternal secrets, privacy.

C. Nonscretive and discrediting data:

1.sanction immunity,

2.normative dissensus,

3.selective dissensus,

4.making good on a threat for credibility,

5.discovered dirty data.

D. Secretive and discrediting data: Hidden and dirty data.

TABLE 2: Methods of Obtaining Hidden and Dirty Data

Deception

Experiments

infiltration

covert-surveillance

information gathering

Coercion

Institutionalized discovery practices

grand-jury

Freedom of Information Act

discovery motions

record keeping requirements

manipulation

hypnosis

chemicals

ESP

Volition

Whistleblowers

leaks

gossip

informants

open field work

archives

personal documents

Uncontrollable Contingencies

Accidents

mistakes

victims

coincidences

traces

fall-out

residues

remnants

TABLE 3: Some Criteria for Contrasting Methods for Gathering Dirty Data

CRITERIA

Method

Ethics: Researcher's Standpoint

Representativeness

Reactivity

Susceptible to Researcher's Initiative

Range of Issues or Topics Covered

Experiments

Problematic because of deception and tampering without consent

With sufficient resources, not an issue

A problem—"Experimenter effects"

Yes

Narrow

Infiltrators

Problematic if researcher is involved in disguised participation observation, less if uses accounts of others

Often problematic, since access is rarely random

A problem

Sometimes

Broad

Uncontrollable contingencies

Not an issue, other than privacy questions

Likely a problem, hard to determine how typical an event is

Not an issue

No, except for reading trace elements

Narrow

Whistleblowers

Ethical issues fall mostly on the whistleblower

Hard to determine, reasons to expect person is atypical

Assuming initial good faith on whistleblower’s part, not a problem

No, though institutional mechanisms may facilitate

Broad

Open field work

Can be problematic insofar as it involves conflict between loyalty to subjects and demands of outside society

Usually problematic, but with sufficient resources and access may not be

Can be a problem, but more amenable to control than with infiltrators

Yes

Broad

Matters of public record; Freedom of Information Act

Generally not an issue

Hard to determine, with respect to both what is recorded and what survives

An issue, insofar as subjects anticipate their data will become public and may shape it accordingly

Basically, no; researcher must pretty much take whatever was written down and made public

Intermediate

Experiments

Not an issue except for problems of reactivity

Moderate

Moderate to expensive

New data

Infiltrators

Can be problematic—hidden agendae, lack of cross-observer validation

Good acting ability, willingness to take risks, role conflicts

Inexpensive, moderate if researcher is infiltrator

Uncontrollable contingencies

Generally not an issue

Moderate

Inexpensive

Lends self best to discovery rather than explanation

Whistleblowers

Researchers must be skeptical and ask what hidden agenda the informant may have

Does not apply

Inexpensive

Open field work

Less of an issue

Highly specialized

Moderate

Unlike other methods, does not assume a conflict relationship

Matters of public record

Less an issue of validity than relevance of what gets written down; given knowledge that it may become public can generate methods of institutionalized evasion

Minimal, but you need to know what you are looking for

Generally inexpensive

After-the-fact data, you need an idea of what to look for/ask for, can be a useful followup to leads from uncontrollable contingencies or whistleblowers

Back to Home Page | Top | Notes | Tables

You are visitor number

to this page since November 1, 1999.

A. Nonsecretive and nondiscrediting data: Routinely available information.
B. Secretive and nondiscrediting data: Strategic and fraternal secrets, privacy.
C. Nonscretive and discrediting data:
	1.sanction immunity,
	2.normative dissensus,
	3.selective dissensus,
	4.making good on a threat for credibility,
	5.discovered dirty data.
D. Secretive and discrediting data: Hidden and dirty data.

Deception
	Experiments
	infiltration
	covert-surveillance
	information gathering
Coercion
	Institutionalized discovery practices
		grand-jury
		Freedom of Information Act
		discovery motions
		record keeping requirements
	manipulation
		hypnosis
		chemicals
		ESP
Volition
	Whistleblowers
	leaks
	gossip
	informants
	open field work
	archives
	personal documents
Uncontrollable Contingencies
	Accidents
	mistakes
	victims
	coincidences
	traces
	fall-out
	residues
	remnants

CRITERIA
Method	Ethics: Researcher's Standpoint	Representativeness	Reactivity	Susceptible to Researcher's Initiative	Range of Issues or Topics Covered
Experiments	Problematic because of deception and tampering without consent	With sufficient resources, not an issue	A problem—"Experimenter effects"	Yes	Narrow
Infiltrators	Problematic if researcher is involved in disguised participation observation, less if uses accounts of others	Often problematic, since access is rarely random	A problem	Sometimes	Broad
Uncontrollable contingencies	Not an issue, other than privacy questions	Likely a problem, hard to determine how typical an event is	Not an issue	No, except for reading trace elements	Narrow
Whistleblowers	Ethical issues fall mostly on the whistleblower	Hard to determine, reasons to expect person is atypical	Assuming initial good faith on whistleblower’s part, not a problem	No, though institutional mechanisms may facilitate	Broad
Open field work	Can be problematic insofar as it involves conflict between loyalty to subjects and demands of outside society	Usually problematic, but with sufficient resources and access may not be	Can be a problem, but more amenable to control than with infiltrators	Yes	Broad
Matters of public record; Freedom of Information Act	Generally not an issue	Hard to determine, with respect to both what is recorded and what survives	An issue, insofar as subjects anticipate their data will become public and may shape it accordingly	Basically, no; researcher must pretty much take whatever was written down and made public	Intermediate
Experiments	Not an issue except for problems of reactivity	Moderate	Moderate to expensive	New data
Infiltrators	Can be problematic—hidden agendae, lack of cross-observer validation	Good acting ability, willingness to take risks, role conflicts	Inexpensive, moderate if researcher is infiltrator
Uncontrollable contingencies	Generally not an issue	Moderate	Inexpensive	Lends self best to discovery rather than explanation
Whistleblowers	Researchers must be skeptical and ask what hidden agenda the informant may have	Does not apply	Inexpensive
Open field work	Less of an issue	Highly specialized	Moderate	Unlike other methods, does not assume a conflict relationship
Matters of public record	Less an issue of validity than relevance of what gets written down; given knowledge that it may become public can generate methods of institutionalized evasion	Minimal, but you need to know what you are looking for	Generally inexpensive	After-the-fact data, you need an idea of what to look for/ask for, can be a useful followup to leads from uncontrollable contingencies or whistleblowers