Front page    Table of  Contents   Abstract
Chapters  1   2   3   4   5   6   7   8   9   Bibliography

Chapter 3: Research Methods

1. Overview

Based on the technological and sociological perspectives presented in Chapter 2, I embarked on two research activities: (i) observing the deployment and use of current technologies in an inter-governmental context, and (ii) experimenting with new and emerging technologies in a laboratory setting. This hybrid approach recognized the intertwined nature of technologies and organizations in geographic information infrastructures. Furthermore, by observing organizations in the recent past and building tools in the laboratory for the near future, I sought to build an understanding that would hold across a range of contexts and remain valid for some time to come.

a. Case studies: contextual views of information sharing infrastructures

In the first phase of the research, I studied three interagency infrastructures for information sharing related to large shared natural resources: the Great Lakes, the Gulf of Maine, and the Columbia River system. The case study traces the design choices people made in building these infrastructures, and the infrastructures’ patterns of growth and change, so as to understand their role in information sharing, collaboration, and consensus-building among the organizations involved. Section 2 below details my choice of cases and the methods I used for studying and learning from them.

b. Prototype development: anticipating the effects of emerging technologies

The second phase of the research drew on a prototype networked information service I built as part of the US National Spatial Data Infrastructure. This service provides detailed geographic data and metadata to users via the World Wide Web, and embodies several design choices likely to facilitate high-quality information sharing. As such, it provided a tangible response to some of the needs arising from the case study, and it suggested the trends likely to bring radical changes in the meaning and modes of information sharing in years ahead. Section 3 below sketches the prototype design and development effort.

2. Organizational case studies

The first phase of research was a study of three inter-governmental geographic information infrastructures. I sought to compare the design, implementation, and evolution of these information sharing infrastructures, understand their patterns of structural change over time, and document any specific new tasks or processes enabled by information sharing. I also wanted to focus on cases that could serve as examples for a wide range of organizations, and where geographic information sharing seemed likely to grow in its impact on the tasks and processes of participating agencies. These goals, and the need to study a complex topic within its institutional context, led me to adopt a case study strategy (Yin, 1984). The next few paragraphs sketch the criteria I used for evaluation and comparison, my choice of cases, data collection methods, and methods for analysis and synthesis.

a. Criteria for evaluation and comparison

I chose four criteria to evaluate and compare geographic information infrastructures:

The impacts of the infrastructure, in terms of changed tasks (jobs performed that would have been infeasible without the infrastructure), or changed processes (jobs performed differently thanks to the infrastructure).

The size of the infrastructure, based on its data holdings, traffic volumes, budgets, etc.;

The quality of the shared information, in terms of concurrency and timeliness, its precision and accuracy, and its encapsulation (details below).

The quality of the information-sharing mechanism, in terms of reciprocity, scalability, and non-intrusiveness.

These last two criteria require some clarification. First, the quality of shared information for a particular use is related to its concurrency, timeliness, and encapsulation:
Concurrency and timeliness: how well does the update frequency of the shared information match the change frequency of the phenomenon it represents, or that required by users for their tasks? For instance, given that weather patterns change every few hours, a newspaper’s weather report has a lower concurrency than National Weather Service broadcast. Timeliness depends on the user: a mariner might listen to the Weather Channel continuously to know when to return to shore, but a climate researcher may only be interested in yearly summaries.

Precision and accuracy: how exactly specified is the information (decimal digits, spatial resolution) and how correctly does it describe phenomena (percent error, reliability)? For instance, my digital watch tells time with much more precision than my antique grandfather clock, but if I set it wrong, it may give me far less accuracy.

Encapsulation: how much effort is needed to use the shared information once it’s obtained? Not too long ago, the US Census Bureau distributed its data as strings of EBCDIC numbers on 9-track magnetic tapes; any use required lengthy parsing and repackaging. It now puts out much more encapsulated data: its Web sites let anyone query and retrieve Census data in a variety of formats, or even create demographic maps on demand.

These three criteria form a user view of the quality of information sharing. A second perspective is that of the system’s architect or manager, and assesses the quality of the sharing mechanism itself, which depends on its reciprocity, scalability, and non-intrusiveness:
Reciprocity: how readily can participants both send and receive information? For instance, with a newspaper or television broadcast a source sends information to receivers; receivers have few opportunities to communicate back to the source, or to other receivers. By comparison, a book club or online forum is more reciprocal in that it allows its participants both to give and receive information.

Scalability: how much maintenance effort is needed to add large numbers of participants or datasets, or wide varieties of data and protocols, to the infrastructure? Some databases are quite rigidly specified: each additional user or data source requires as much work to set up as the first user or source. More scalable systems can grow in size or use with much less incremental effort.

Non-intrusiveness: how little change does information sharing impose on participants’ internal work procedures? Accessing a database, and including its information in an ongoing project, might require very specific formats, software, or even operating systems or hardware: such an infrastructure would be quite intrusive. Recent "open systems" and "interoperability" efforts have sought to facilitate the non-intrusive transfer of information and instructions between independent computer systems.

b. Choice of cases

As mentioned earlier, I wanted to focus on cases that could serve as examples for a wide range of organizations, and where geographic information sharing seemed likely to grow in its impact on participating agencies. Therefore, I chose sites for study according to the high quality of their information-sharing, as outlined above, and I applied several additional selection criteria. First, to focus on task-related information sharing, I favored non-supplier organizations, ones whose purpose was research, policy, or management, over ones that existed simply to provide data to others. Second, I favored cases where an infrastructure of some kind was being built to enable systematic, planned information sharing (i.e. not just ad hoc sharing, via methods invented anew each time). Third, I favored loosely-coupled coalitions over strong "umbrella" agencies that could decree uniform standards and procedures. Fourth, I chose agency groups that had formed around a clearly identified shared natural resource: this provided a "superordinate goal" that would minimize organizational resistance to change. And fifth, I favored cases where geographic information was being shared. So in summary, I looked for loosely-coupled coalitions of non-supplier agencies that were building an infrastructure to share geographic information in managing a shared natural resource.

To find suitable cases for study, I obtained initial contacts around the country from colleagues and a few Boston-area Federal offices. These people in turn referred me to their colleagues and acquaintances who knew of geographic information sharing efforts in environmental contexts; and after a few cycles of this, I learned of several interagency information sharing efforts motivated by shared ecosystems. I spoke by telephone with the information systems or GIS coordinators for the Chesapeake Bay, Gulf of Mexico, and Great Lakes programs of the Environmental Protection Agency (EPA); the Gulf of Maine Council on the Marine Environment; the Northern Forestlands Study in northern New England; the Silvio O. Conte Wildlife Refuge on the Connecticut River; the Delaware River National Estuary Program; the Bonneville Power Administration’s Northwest Environmental Database; the Lake Tahoe Regional Planning Agency; and the Tennessee Valley Authority’s TERRA decision support system.

Two findings quickly became clear from this preliminary inquiry. First, in order to have any cases to study, I needed to relax my "quality" criteria, which were quite stringent and applied to a more highly networked context than that of these agencies. Second, several of these programs were not building an infrastructure: their information sharing consisted of either ad hoc exchange of files on diskettes, a one-time effort to compile information from many sources, or a homogeneous distributed information system within a single agency. Thus, I narrowed my field to the Great Lakes Information Network (GLIN), the Gulf of Maine Environmental Data and Information Management System (EDIMS), and the Northwest Environmental Database (along with its successors, the Coordinated Information System and StreamNet). These were enough for the exploratory analysis sketched in d. below. A brief sketch of the three cases is helpful before I describe my chosen research methods.

Great Lakes Information Network (GLIN). The Great Lakes Commission, a partnership of eight US states, began experimenting with the Internet in 1993 to enhance communication and coordination among the many groups concerned with the Great Lakes. The resulting Great Lakes Information Network (GLIN) (cf. Ratza, 1996) links the Great Lakes states (Minnesota, Wisconsin, Illinois, Indiana, Michigan, Ohio, Pennsylvania, and New York), the Canadian province of Ontario, and several Federal agencies and other public and private groups in both the US and Canada. GLIN saw rapid, unpredictable growth in size and usage through the "evangelistic" efforts of its founders and the nature of the emerging World Wide Web, and encountered challenges related to its growth, including distributed data management and interaction with other information sharing initiatives.

Gulf of Maine Environmental Data and Information Management System (EDIMS). Three U.S. states (Massachusetts, New Hampshire, and Maine) and two Canadian provinces (New Brunswick and Nova Scotia) formed the Gulf of Maine Council on the Marine Environment in early 1990 to facilitate joint use and protection of their shared coastal and marine habitats. One of the Council’s main concerns was the interchange of information among its participants, by means of an Environmental Data and Information Management System (EDIMS) (Brown et al., 1994). This System was initially built in the pre-Web years, then lost its funding for a couple of years just as the Web was becoming widespread, and had to regain its legitimacy amidst a very diverse set of organizations.

Northwest Environmental Database / Coordinated Information System / StreamNet. Beginning in 1984, the states and tribes of the Pacific Northwest (Montana, Idaho, Washington, and Oregon) worked with the federal Bonneville Power Administration to build two region-wide rivers information systems, the Northwest Environmental Database (NED), describing river-related fisheries, wildlife, and hydroelectric facilities, and the Columbia River Coordinated Information System (CIS), that tracked anadromous fisheries. Data for both systems were shared via coordinators in each state, using geographic stream identifiers, and encountered a variety of challenges in maintaining region-wide standards and joint usage among agencies with a wide variety of technological maturity. In 1996, both systems were subsumed into an advanced Internet-based data interchange system, known as StreamNet (BPA, 1996).

c. Data collection methods

To compare the information sharing infrastructures to each other, understand their patterns of change, and document new tasks or processes specifically enabled by information sharing, I needed an in-depth view of the choices people had made and were making, along with the context in which these choices were made. I also knew that my chosen topic was a complex, unpredictable one, that could well take unexpected turns as I learned about my cases. Therefore, in keeping with Yin (1984), I opted for loosely structured interviews with the key architects, builders, users, and supporters of the infrastructures. I drew up a three-page interview protocol, in five parts: (i) new tasks and processes due to shared information; (ii) the size and quality of information sharing; (iii) the relationship of information sharing to changed tasks or processes [for internal validity (Yin, 1984)]; (iv) the infrastructure’s technological design and context; and (v) its organizational context.

In each case, to select interview participants, I began with written materials sent to me by my contact from my preliminary inquiries. Using these materials (and Web pages where available), I drew up a preliminary list of names from committee rosters, author lists, and meeting attendance lists. I submitted this list to the "informant" in each case, who suggested additions, removals, and corrections; and then I set about contacting the various people on my list, either by electronic mail or by telephone, to schedule in-person, onsite interviews. During initial contacts with these interviewees, several of them in turn suggested additional interviewees. Table 3-1 summarizes the parameters of the three case study efforts.

Great Lakes Information Network (GLIN) Pacific Northwest Environmental Database / CIS / StreamNet Gulf of Maine EDIMS
Fieldwork 7 days 
Feb. 17-28, 1995
10 days 
Apr. 25 - May 5, 1995
6 days 
May 23 - July 19, 1996
Locations IL: Chicago 
MI: Ann Arbor, Detroit, Lansing, Bay City 
ON: Windsor, Toronto, Etobicoke, Hamilton
OR: Portland, Salem, Gladstone 
WA: Olympia, Lacey, Seattle 
MT: Kalispell, Helena 
ID: Fort Hall, Boise
NH: Durham 
ME: Augusta, Falmouth, Biddeford 
NS: Halifax, Dartmouth 
MA: Boston
Interviewees 17 32 12
Documents collected 7" shelf space 
15 lbs.
36" shelf space 
78 lbs.
10" shelf space 
22 lbs.
Draft review April 1995 Jan. 1996 Aug. 1996
Table 3-1. Case study parameters
For each case, prior to embarking on fieldwork, I conducted extensive Web searches to familiarize myself with my interviewees’ context and tasks as much as possible before meeting them, so as to adapt the interview to their sphere of activity. For each case, I conducted the interviews, 45 minutes to 2 hours each, in a one to two-week automobile tour of the states and provinces involved (for the third case, closer to home, my fieldwork was more drawn out). I used portions of the interview protocol (not all of it applied to each interviewee) to investigate the use and role of shared information in each organization, the apparent factors in its growth or stagnation over time, and the role of technological and organizational contexts and priorities.

These interviews led to additional readings (internal reports, educational brochures, electronic mail archives, meeting minutes, and Web pages); additional telephone and electronic-mail conversations with interviewees; and reviews of databases, geographic datasets, and online data services. I continued doing spot checks of the key Web pages in each case every few weeks, right up until May 1997. As listed in Table 3-1, I sent out draft case summaries for review to many interviewees (those whose information I used) at three points along the way: their responses, mostly by electronic mail, provided accuracy checks and additional insights. Other sources of information included an educational videotape, a few lengthy unstructured conversations, attendance at committee meetings in the Gulf of Maine case, and (especially in the Pacific Northwest case) direct observation of dams, electric power facilities, fish hatcheries, and landscapes (I drove 2,800 miles). All of these measures strengthened the study’s construct validity (Yin, 1984).

d. Qualitative analysis

Analysis of the study’s findings followed methods outlined by Yin (1984) and Eisenhardt (1989) for literal and theoretical replication. I expected the cases to build on each other according to literal replication, in which similar findings reinforce the logic derived from earlier ones. I also hoped to draw theoretical replication from two dissimilarities between the cases. First, the Pacific Northwest case, in the absence of a data network, would show different outcomes; second, the cases ranged in longevity from about a year to over ten years, possibly suggesting fruitful growth paths over time.

This was my a priori strategy; but Yin is rather brief on the difficult step of generating patterns from complex findings. The results from my fieldwork proved complex enough to require additional hypothesis-generating approaches. For this I drew on grounded theory (Turner, 1983; Glaser and Strauss, 1967), an inductive approach that generates structural patterns and hypotheses from repeated synthesis of the qualitative findings themselves, within their full organizational context—in contrast with traditional hypothesis-testing methods which seek a more context-neutral, formal model. Indeed, although my initial quest was for a variance theory predicting an infrastructure’s organizational outcomes based on its technological design, I soon shifted to a process theory (Markus and Robey, 1988), to understand the choices made in context as the infrastructures were conceived, designed, built, used, and altered over time. This approach was a better fit to the complexity of the information-sharing phenomenon and the richness of the chosen empirical contexts, which are influenced just as much by individual choices than by a set of factors or conditions. The strength of a process model is that it provides a rich understanding of these choices and behavioral patterns as they unfold over time, which can then be used to predict outcomes in closely similar contexts. The process approach also made good use of the individual views expressed by interviewees (and of the researcher’s own partial understanding), instead of discarding them in search of an abstract set of variables and causal relationships.

e. Discussion

This research method presents several potential weaknesses, including its limited generalizability beyond a narrow range of cases, the tension between individual choices and broader trends, and the iterative, never-finished analysis process.

First, some have criticized the qualitative, case-based approach for its difficulty in generalizing beyond a few specific contexts. In response, Yin (1984) argues that cases are not statistical "samples," and that the goal in case study research is to understand behavioral logic ("analytical generalization"), not to enumerate frequencies ("statistical generalization"). Yet this is only a partial response: even behavioral logic patterns that arise from such research are usually only necessary, not sufficient, conditions (Markus and Robey, 1988), so the generalizations provided by this research may still seem rather weak. However, when studying voluntary human decisions and actions, such "soft" predictions are expected and appropriate: social forces, political contexts, or technological resources cannot have a predictable effect on people’s choices. Thus, the value of case study findings lies not in their complete generality, but in the behavioral insights they suggest. These insights can be used to build organizational and technological savoir-faire for other, similar contexts, rather than universally valid formulas or techniques.

Second, my choice of cases might seem overly narrow. It is indeed skewed towards decentralized, loosely-coupled technological and organizational structures, and may offer few implications for more integrated or centralized structures, more unsystematic, ad hoc approaches, or even organizations that are not sharing information at all. But my choice was motivated by an interest not just in information sharing per se, but in the design and implementation of infrastructures for information sharing between independent systems. To broaden the set of cases, I would have had to stretch the infrastructure concept quite far; I had already loosened my "quality" criteria quite a bit to fit the mid-1990s networking context. Another limitation of my choice of cases is that they all feature a shared natural resource. This choice excluded inter-agency groups organized around metropolitan areas, transportation networks, or states; and local or national cooperative information systems efforts. However, this choice ensured that organizations were linked by a well-defined physical relationship, and it provided a degree of comparability across the cases.

Third, the narrative, context-specific nature of my interviews and findings presented the risk of concentrating on individual decisions, and missing underlying trends. The structuration perspective presented in Chapter 2, with its sensitivity to broad structural shifts, provides some protection against this risk; but the study’s conclusions do remain open to debate about the role of individual choices vs. underlying social forces. But here again, this "limitation" is in fact a more accurate representation of human behavior within social systems: both individual choices and broader trends are usually at work, and become more or less visible depending on the level of analysis: individual, meso-, and macro scale (Misa, 1994). Monitoring the cases for some time would help to confirm, correct, and deepen the study’s insights on the role of individual choices vs. that of broader societal forces.

Fourth, my loosely structured, qualitative approach made it difficult to predict not only the study’s findings, but even the nature of those findings: prior to collecting and analyzing the data, I had an interview protocol, but the deeper questions to be answered were themselves undefined. Thus, a single set of interviews provided only limited insight; and subsequent analysis was quite difficult, given that it was not guided by any prior structure or hypothesis. Drawing conclusions from these findings was not a one-time event, but an ongoing process of (re)interpretation—long after formal data collection was complete. This is not surprising given the complexity of the study topic: its conclusions are not only context-specific and not fully certain, but perennially subject to interpretation and debate even within their own context. However, this kind of study is valuable not for its finality, but for its insights into behavioral logic. This logic, though context-specific, is nonetheless likely to hold in other, similar contexts, and can thus be used to build organizational and technological savoir-faire. The lessons learned from this study are intended to be cogent contributions to ongoing discussions in the technological, organizational, and policy fields. For this sort of learning, the qualitative, interpretive approach followed here works very well.

3. Prototype development

For the second phase of the research, my objective was to provide a "reality check" on the case-study findings, a technological counterpoint to the case studies’ organizational focus, that would help me to extrapolate their findings beyond the current state-of-the-art into the near future within a changing technological context. The next three paragraphs explain the choice of prototype, the a priori design choices, and methods for evaluation.

a. Selection of prototype

First, it’s hard to prototype an information infrastructure usefully, especially the kind investigated here, in a laboratory setting. However, the US National Spatial Data Infrastructure provided a natural context for this kind of experimentation; and the Federal Geographic Data Committee’s seed funding, through the Competitive Cooperative Agreements Program (CCAP), was a good match to the intent of my research, and to the timeline and scope I needed. So, together with a small team of researchers in the Computer Resource Laboratory of MIT’s Department of Urban Studies and Planning, I wrote a proposal for CCAP funding in February 1995 (just prior to my first segment of fieldwork). With partial support from these funds, I worked on this project between November 1995 and April 1997, while following up from the first two case studies and conducting the third.

In keeping with the Competitive Cooperative Agreements Program, the project was to create a node on the NSDI’s Geospatial Data Clearinghouse—that is, a Web-based service providing a searchable standard interface to geographic metadata. This was part of what I wanted to experiment with; but I also wanted to learn what it took to provide the geographic information itself online. While not a requirement of CCAP funding, this was nonetheless an area of great interest to the NSDI Clearinghouse (Nebert, 1995) and to other members of the team at MIT.

The choice of orthophotos arose from three lines of reasoning: First, orthophotos were gaining prominence in state GIS centers, under the US Geological Survey’s digital orthophoto quadrangle (DOQ) program, and they were increasingly seen as an important "Framework" data layer in the National Spatial Data Infrastructure (FGDC, 1995). Yet the usual way in which orthophotos were distributed (very large files on compact discs) restricted their use to a small cadre of advanced GIS specialists—even though the full orthophotos, at full resolution, were rarely what people needed. So it seemed worthwhile to investigate alternate distribution mechanisms for these orthophotos over the Internet (e.g. delivering only the desired piece of an orthophoto, and at a variety of resolution levels) to see whether these mechanisms might make orthophoto data more accessible to a wider audience. Second, given that orthophotos are raster images precisely aligned along geographic coordinate axes, they provided a simple data structure, with simple ties to geography. This would allow the project to explore the design of the data interface, and questions of geographic standards and interoperability, without spending too much time parsing and translating file formats, or transforming the images to fit geographic coordinates.

b. Design choices

Some of the design choices behind the orthophoto prototype, such as standardized, searchable metadata, were fixed by the requirements of the CCAP funding. Other choices went beyond the program’s requirements, and were more open-ended. For instance, from the earliest design stages, it was clear that the orthophoto browser had to use the network sparingly, so as to offer a viable alternative to CDs or other conventional distribution mechanisms. This called for distributing small sized orthophoto excerpts, in compressed form. Also, because the information was geographical in nature, it was important to give the browser an intuitive geographic interface, with basic actions such as zoom in/out and pan north/south/etc., but without re-creating an entire mapping package. A third design goal, motivated by an interest in broader questions of geographic interoperability, was to facilitate reading the orthophoto data into client-side GIS or mapping software, to help users integrate local with remote data. A fourth development goal (really an underlying principle) was a staged development path that would define and target intermediate functionality goals, while preparing for more sophisticated capabilities. This led, for instance, to getting a simple interface up and running quickly with fixed image tiles; but the underlying script anticipated a variable tile size and was therefore easy to adapt.

c. Evaluation

Given the nature of the project and my use of it in the context of this thesis, the focus is not on a detailed usage study, but on building and interpreting a "proof-of-concept" to illustrate and instantiate several ideas about geographic information sharing. Nevertheless, one important step was taken to provide valuable usage information: when the orthophoto browser was announced, the Web server’s log was restarted and it logged every Web transaction over more than six months. I assessed people’s use of the orthophoto prototype in three ways: I perused and analyzed the Web server’s transaction log (Sept. 1996-March 1997) through relational database queries and wwwstat software; I read and kept the many comments that came in by electronic mail by users of the browser; and I placed telephone calls to a few known users in Boston city government to gather anecdotal evidence about modes of use.

4. Synopsis

As a result of this hybrid study, the structure of the next few chapters may require a brief explanation. Chapters 4, 5, and 6 contain the case-by-case summaries from fieldwork. These write-ups follow a common outline, which facilitates cross-case comparisons and synthesis in Chapter 7. Chapter 8 then describes the orthophoto prototype project, and draws conclusions from both the development process and the final product achieved. Lastly, Chapter 9 draws on the conclusions from both the case studies and the prototype to sketch the implications of effective geographic infrastructures for technology, organizations, and policy.


Front page    Table of  Contents   Abstract
Chapters  1   2   3   4   5   6   7   8   9   Bibliography