MIT MIT TDQM Program

 

1993 Working Paper Abstracts


TDQM-93-01: March 1993 A Process View of Data Quality
by Henry Kon, Jacob Lee & Richard Wang
This paper is on data quality definition from a process perspective. A formal process model of an information system offers precise process constructs for characterizing data quality. With these constructs, we rigorously define the key dimensions of data quality. This process view of data quality is particularly relevant in light of the recent push towards process orientations in business analysis and business process design.

TDQM-93-02: May 1993 A Qualitative Approach to Automatic Data Quality Judgment
by Yeona Jang, Alexander Ishii & Richard Wang
Published in the Journal of Organizational Computing.
** This paper supercedes TDQM-92-05.


TDQM-93-03: April 1993 Closing the Data Quality Gap: Using ISO 9000 to Study Data Quality
by Chris Firth & Richard Wang
This paper presents an application of the ISO 9000 international quality standard to data. The ISO framework provides a comprehensive set of considerations for quality management that range from issues such as quality in marketing to quality in product safety and liability. Using this framework, the paper presents a survey of data quality research; the selection criterion is to include work by authors who explicitly recognize a data quality problem. The paper will be especially useful to researchers new to the field of data quality.


TDQM-93-04 Assessing the Accuracy of Derived Data: An Analytical Approach Based on the Relational Model
by Richard Wang & M. P. Reddy
What is the quality of the data you "consumed" in the last report you read? This paper presents a mechanism so that, given a user query and the accuracy of underlying base relations, the accuracy of the derived data can be computed. By knowing the accuracy of the derived data, the user may make a judgment as to the risk involved in using the data. The accuracy of derived data enables data quality analysts to study the importance of various quality control strategies. Such a mechanism may also identify likely candidate tables for quality enhancement.
** This paper is superceded by TDQM-96-07.


TDQM-93-05 On the Definition of Data Quality Dimensions
by Yair Wand & Richard Wang
** This paper has been superceded by TDQM-94-03.


TDQM-93-06: May 1993 A Case Study of the Business Impact of Data Quality in the Airline Industry
by Handrito Hardjono
Poor data quality problems have been experienced by many organizations. Some are aware of the problem and ignore it and others are unaware of its existence. But very few organizations take steps to prevent poor data quality from becoming a major problem. This paper presents a case study from an airline company experiencing data quality problems in their inventory recording system. Data quality awareness was triggered twice in this case, first by technology during implementation of a new computer system. Second, by a conflict between maintenance and sales because of delay in maintenance release. This study analyzes the flow of operation and handling of data on aircraft spare parts, and in conclusion we present possible solutions in the context of poor data quality. Our study showed that the solutions offered, if implemented, could save the airline $378,00 per year after three years.
Also available as his Sloan Fellow Thesis


TDQM-93-07: May 1993 A Case Study of Data Quality in the Eyewear Industry
by Shirley Lai
This descriptive case study of a major eye-wear company provides a view of one company's attempt to deal with data quality issues and draws some insights from the company's efforts. One of its major projects was the development of a new information system to provide store managers and opticians with more timely and accurate information on order processing and pricing. The paper is based on a three level framework for analysis of the alignment of company activities and systems. The three levels are business process, data flow, and systems platforms. Observations are made as how misalignment of these components leads to data quality problems. Improvements are identified that will increase company revenues and store efficiency. Better information capture is also found to improve the interaction/relation between the optical stores and the labs that manufacture spectacles.
Also available as her Sloan Master's Thesis


TDQM-93-08: June 1993 Unavailable


TDQM-93-09: June 1993 A Model of Data Manufacturing for Information Quality Improvement
by Donald Ballou, Richard Wang, Harold Pazer and Giri Kumar Tayi
** This paper is superceded by TDQM-94-06


TDQM-93-10: June 1993 DQA: A Software Tool for Analyzing Data Quality in Data Manufacturing Systems
by Sung Pak and Alicia Pando
The business community's interest in improving data processing systems has grown as the costs of faulty data skyrocket. DQA (Data Quality Analyzer) is a GUI software tool to assist in the design and analysis of data manufacturing systems. It is based on a functional specification by Professor Don Ballou, Professor Harold Pazer and G.T. Tayi of the State University of New York in Albany, and Professor Richard Wang of M.I.T. This software tool is aimed at reengineering data manufacturing systems to enhance quality and value of information. By hand, sketching and analyzing data manufacturing systems is extremely tedious, and modifying systems is difficult. DQA allows such design and reengineering tasks to be completed in a fraction of the time.


TDQM-93-11: July 1993 Data Quality Research: A Framework, Survey, and Analysis
by Richard Wang, Veda Storey, and Chris Firth
** This paper is superceded by TDQM-95-02.
Conditionally accepted for publication in IEEE Transaction on Knowledge and Data Engineering


TDQM-93-12: October 1993 An Empirical Investigation of Data Quality Dimensions: A Data Consumer's Perspective
by Richard Wang and Diane Strong
** This paper is superceded by TDQM-94-01.


TDQM-93-13: October 1993 Modeling Data Quality and Context Through Extension of the ER Model
by Steven Y. Tu and Richard Y. Wang
Capturing data quality and context semantics at the early stage of database design is a critical issue for both database researchers and practitioners. As in traditional database design, users' quality and context requirements should be represented at the conceptual level. This paper first generalizes the issues governing data quality and context semantics to be an Interattribute-Relationship problem, and then examines the feasibility of the Entity-Relationship (ER) model as a solution. Investigation on alternatives using the existing ER constructs reveals that it is necessary to extend the ER model.

An extension called Attribute-Relationship (AR) is proposed to extend the view of having relationship at the attribute level. The attributes that involve in the relationship are termed strong attribute and weak attribute. In the AR extension, the dual roles of the strong and weak attributes and the embedded existence dependence between them are represented through an identifying relationship. Finally, a range of integrity constraints related to this extension are presented.

Published in the WITS-'93 Conference Proceedings, Orlando, Florida, December 1993


TDQM-93-14: October 1993 An Object-Oriented Implementation of Quality Data Products
by Richard Wang, M.P. Reddy, and Amar Gupta
This paper investigates how to associate data with quality information that can help consumers make judgments of the quality of data for the specific application at hand. Our research question is how to structure and manage data in such a way that consumers could be equipped with the capabilities to measure the quality of data they need and to retrieve the data that conforms with their quality requirements.

Toward this goal, we propose the concept of quality data object in which each datum object is associated with appropriate data and procedures used to indicate the quality of the datum object. Specifically, the is-a-quality-of link is proposed to associate a datum object with its corresponding quality description object. The composite object constructed from a datum object and its associated quality description object provides methods which can access object instances which matches consumers' quality requirements. It also provides a set of quality measure methods that compute quality dimension values including currency, volatility, timeliness, accuracy, consistency, and completeness.

We envision that the quality data objects proposed in this paper can be used as basic building blocks for the design, manufacture, and delivery of quality data products. This will enable consumers to measure the quality of data products according to their chosen criteria; and to procure data products based on their quality requirements, hopefully enhancing overall data quality and data reusability. We are currently working to provide a more concrete definition for a data product and to crystallize its characteristics in further detail.

Published in the WITS-'93 Conference Proceedings, Orlando, Florida, December 1993


TDQM-93-15: August 1993 On Validation Approaches in Data Production
by Jacob Lee and Richard Wang
Data quality problems have been studied in different disciplines, either implicitly or explicitly. Thus, the knowledge relevant to data quality management at present is fragmented over various disciplines. In order to facilitate progress in data quality research, a coherent and unified view of the relevant concepts and techniques from these various areas is required. In this paper, we develop a framework that is intended to achieve this goal. This framework is based on a rigorous definition of data quality and three concepts of validity: internal, external and process. This framework presents a means for relating and analyzing various concepts and techniques relevant to data quality research.