1995 Working Paper Abstracts

1995 Working Paper Abstracts

TDQM-95-01: January 1995 Designing Information Systems to Optimize the Accuracy Timeliness Tradeoff
by Donald Ballou and Harold Pazer

It is well known, of course, that the assessment of this month's economic activity will improve with the passage of time. The same situation exists for many of the inputs to managerial and strategic decision processes. Information regarding some situation or activity at a fixed point in time. However, as a consequence of the dynamic nature of many environments, the information also becomes less relevant over time. This balance between using current but inaccurate but outdated information we call the accuracy timeliness tradeoff. Through analysis of a generic family of environments, procedures are suggested for reducing the negative consequence of thistradeoff. In many of these situations, rather general knowledge concerning relative weights and shapes of functions is sufficient to determine optimizing strategies.

Also in Information Systems Research (ISR), Vol. 6, No. 1, 1995, pp. 51-72.

TDQM-95-02: February 1995 A Framework for Analysis of Data Quality Research
by Richard Wang, Veda Storey, and Christopher Firth

Organizational databases are pervaded with data of poor quality. However, there has not been an analysis of the data quality literature that provides an overall understanding of the state of art research inthis area. Using an analogy between the product manufacturing and data manufacturing, a framework for analyzing data quality research is developed and is used as the basis for organizing the data quality literature. This framework consists of seven elements: management responsibilities, operation and assurance costs, research and development, production, distribution, personnel management, and legal function. The analysis reveals that most research efforts focus on operation and assurance costs, research and development,and production of data products. Unexplored research topics and unresolvedissues are identified and directions for future research provided.

Also in IEEE Transactions on Knowledge and Data Engineering , Vol .7, No. 4 , 1995, pp. 349-372.

** This paper supercedes TDQM-93-11

TDQM-95-03: February 1995 Managerial Issues in Data Quality
by Donald Ballou and Giri Kumar Tayi

Data has become the raw material for the Information Age. Accordingly, maintaining and enhancing the quality of the data resource needs to be an important organizational activity. However, doing this presents a seriesof managerial challenges. These arise in part from the nature of data but more from the increasingly prevalent trend of using data in ways that were not envisioned or intended. An individual or group changed with responsibility for the data's integrity must determine who is using what data for which purpose, assess the nature and extent of any deficiencies that may exist,and evaluate the impact that quality problems could have on the various uses of the data. This paper explored issues such as these, while highlighting concepts, techniques, and tools for ensuring that the organization's datais of high quality.

Also in The 1996 Conference on Information Quality, Cambridge,MA, pp. 186-206

TDQM-95-04: February 1995 Quality Dimensions of a Conceptual View
by Anany Levitin and Thomas Redman

Data quality is usually associated with the quality of data values. But even perfectly correct data values are of little use if they are based on a deficient data model. The purpose of this paper is to present and discuss a list of characteristics (dimensions) that are crucial for data model quality. We single out 14 quality dimensions, organized into six categories: content, scope, level of detail, composition, consistency,and reaction to change. Two types of correlation among dimensions called "reinforcements" and "tradeoffs" are recognized and discussed as well.

Also in, Informaton Processing & Management, Vol. 31, No.1

TDQM-95-05:February 1995 Data and Data Quality
by Christopher Fox, Anany Levitin, and Thomas Redman

This article presents a definition of data and their quality dimensionsas the basis for a survey of data quality control and improvement techniques, with special attention to the state and prospects for the quality of data in information retrieval systems and library catalog systems (here after joinly referred to as document information systems). This article does not treat questions of data system quality, such as timeliness of update, system reliability, system accessibility, and usability, and data security.

Also in Encyclopedia of Library and Information Science

TDQM-95-06: February 1995 Data Quality
by Y. U. Huh, F. R. Keller, T. C. Redman and A. R. Watkins

Data are used in the delivery of many products and services, and so data quality is an important component of customers' perceptions ofthe quality of these products and services. The paper describes efforts initiated by AT&T to control and improve the quality of data it uses to operate its worldwide intelligent network, to conduct its day-to-day operations, and to manage its businesses smoothly. These efforts stem from the observation that it is extremely difficult to fix faulty data once they are in a database. Therefore attention must be directed at processes that introduce, modify, and transform data. Only when these processes have been put into a state of statistical control can sustainable improvements in data quality be expected. The report describes AT&T's four-part data quality improvement program.

TDQM-95-07: February 1995 Data Quality for Telecommunications
by Thomas Redman

In the last several years, the importance of data in large databases to the operation of telecommunications networks has grown considerably.For example, all provisioning, maintenance, and billing operations are critically dependent on data and many new network services are based onreal-time access to data. This makes data quality a major issue for the industry. The purpose of this paper is to outline an approach for addressing at least part of this issue. This approach makes use of process managementto focus attention on processes that create data and data tracking, one method to quantify the performance of such processes. The merits of thisapproach are compared to more traditional methods of "cleaning"databases. A joint AT&T / LEC process, by which special services accessis ordered and provisioned, is used as an example.

Also in IEEE Journal on Selected Areas in Communcations, Vol 12, No. 2

TDQM-95-08: February 1995 The Notion of Data and Its Quality Dimensions
by Christopher Fox, Anany Levitin, and Thomas Redman

The rapid proliferation of computer-based information systems is increasing the importance of data quality to both system makers and users. However, there is neither an established frame work nor common terminology for investigating data quality. There is not even agreement on what the term "data" means. We lay a foundation for the study of data quality in this paper. In the first part of the paper we discuss five approaches to defining "data" in the literature. We then propose an approache specially conducive to discussing data quality. In the second part of the paper we discuss the most important dimensions of data quality: accuracy,completeness, consistency, and currentness. We define these four and several related dimensions and discuss them in detail. We close the paper by outlining several areas for further research on data quality.

Also in Information Processing & Management, Vol 30, No. 1

TDQM-95-09: February 1995 A Model of the data (life) cycles with application to quality
by A. V. Levitin and T. Redman

The purpose of the paper is to present a new model of the data life-cycle. Such a model is needed to clarify activities involving data, from its creation through use, and to establish the relationships of these activities to one another. The proposed model features four principal data cycles: the acquisition cycle includes activities that create and store data, the usage cycle includes activities that retrieve and use data, and the two kinds of the combined cycles incorporate both acquisition and usage activities. The model also includes quality checkpoints and feedback loops. These are particularly useful in clarifying data quality issues.

Also in Information and Software Technology, Vol. 35, No.4

TDQM-95-10: February 1995 Improve Data Quality for Competitive Advantage
by Thomas Redman

Errors in data can cost a company millions of dollars, alienate customers,and make implementing new strategies difficult or impossible. The author describes a process AT&T uses to recognize poor data-quality problems,treating data as an asset, and applying quality systems to the processesthat create data.

Also in Sloan Management Review, Winter 1995

TDQM-95-11: February 1995 A Data Quality Algebra for Estimating Query Result Quality
by Richard Wang and M.P. Reddy

Information superhighways have received much attention in government,business, academic, and media circles recently. In accessing data sources along the information highway, however, it is important to know the meaning of the data and the qualities of data retrieved. Despite the elegant relational theory and mechanisms such as integrity constraints, many databases contain deficient data. If the data in the underlying base relations are deficient,then the query results obtained from these base relations may also containdeficient data even if the query processing mechanism is flawless. Thesedeficient data, in turn, may lead to erroneous decisions, resulting insignificant social and economic impacts. In this research, we propose amethod to estimate qualities of query results given the quality characteristicsof the underlying base relations.

Also in Lecture Notes in Computer Science

TDQM-95-12: February 1995 Integrating Information From Global Systems: Dealing with the "On-and-Off-Ramps" of the Information Superhighway
by Stuart Madnick

Information superhighways offer the possibility to access information from around the world in support of many important applications. Unfortunately,there are significant challenges to be overcome. One particular problemis context interchange.. Each source of information and potential receiver of that information may operate with a different context. A context isthe collection of implicit assumptions about the context definition (meaning)and context characteristics (quality) of the information. When the information moves from one context to another, it may be misinterpreted. This paper describes various forms of context challenges and examples of potential context mediation services, such as data semantics acquisition, data quality attributes, and evolving semantics and quality, that can mitigate the problem.

Also in Journal of Orgarnizational Computing.