In recent years most corporations, large and small, have initiated Total Quality Management (TQM) programs with goals that include 100% satisfaction for customers and no product defects. Quality management programs have been a key factor in the success of companies in many industries.
Often TQM programs and other strategic corporate initiatives are not entirely successful or even fail because the data used to monitor and support organizational processes are incorrect or incomplete or otherwise faulty or inappropriate for a given application. Anecdotal evidence and a growing literature point to data being defective at levels of 10% or more in a variety of applications and industrial contexts, including sales-force automation, direct-mail programs and productivity improvement programs.
MIT's Total Data Quality Management (TDQM) research effort has been grown from industry needs for high quality data. The overall objective of this program is to establish a solid theoretical foundation in this embryonic field and, from this work, to devise practical methods for business and industry to improve data quality. We will develop tools and other capabilities necessary for data quality management in the technical, economic, and organizational phases of business operations.

Research Agenda
Research Objective
The TDQM project has both long-term and short-term focuses. The long-term goal of this research program is to create a theory of data quality based on reference disciplines such as computer science, the study of organizational behavior, statistics, accounting, and the total quality management field. This theory of data quality, in turn, may serve as a foundation for other research contexts where the quality of information is an important component. In the short term, the research goal is to create a center of excellence among practitioners of data quality techniques and to act as a clearinghouse for effective methods and project experiences.
Research Scope
There are three major components of the TDQM research program: data quality definition, analysis, and improvement. The definition component focuses on defining and measuring data quality. The analysis component identifies and calculates the impacts of poor quality data, and the benefits of high quality data, on an organization's effectiveness. Finally, the improvement component involves redesigning business practices and implementing new technologies in order to significantly improve the quality of corporate information. Each of these are briefly described below, along with an example and outline of key research directions.
Definition of Data Quality. Although the notion of "data quality" may seem intuitively obvious, data quality is not well defined in current practice. Our studies have revealed that data quality has a number of dimensions for data users, including accuracy, believability, relevancy, and timeliness. A clear and uniform articulation of data quality metrics is needed. In fact, even a relatively obvious dimension, such as accuracy, does not have a sufficiently robust definition to make techniques apparent as to how to measure the accuracy of data. This component of the research addresses issues of data quality definition, measurement, and derivations.
The research issues we are addressing are: (a) identification of the key dimensions of data quality, (b) precise and meaningful definitions of each dimension, (c) methods of measuring each dimension for base data, and (d) a data quality algebra (DQA) for computing the quality of derived data.
Analysis of Data Quality Impact on a Business. This component addresses the value chain relationship between high quality data and the successful operation of a business (the flip side is how low quality data negatively impacts a business.) Our analysis techniques relate data quality to key business parameters, such as sales, customer satisfaction, and profitability. To illustrate the importance of this kind of analysis, we describe the case of a transportation company. In this company poor data quality and usage caused 77% of missed deliveries, resulting in significant operating costs due to repetition of work and rerouting of shipments. Even more significant was the finding that the use of poor quality information was the major reason for an estimated loss of market share evaluated at about 1 billion in sales.
The research issues we are focusing on are: (a) quantification of business impact of data quality to firms through a collection of case studies, (b) development of Data Quality Value Chain Analysis (DQVCA) techniques to relate data quality to key business parameters, such as sales, customer satisfaction, and profitability, and (c) development of an economic model of the value of quality data.
Improvement of Data Quality. This component addresses various methods for improving data quality. These methods can be grouped into three interrelated categories: (i) business redesign, (ii) data quality motivation, (iii) use of new technologies, and (iv) data interpretation technology. Business redesign attempts to simply and streamline the operation to minimize the opportunity for data errors to occur. Data quality motivation deals with employee rewards, benefits, and perceptions to encourage more careful attention to improving the quality of data handled by the appropriate members of the organization. New data capture technologies can significantly improve quality through techniques such as automated entry and direct inter-computer communication. Data interpretation technologies assist the user in understanding the meaning of the data so that it is not used incorrectly. For example, in the transportation company example, radio frequency-based data entry devices (for equipment and cargo inventories data capture) were introduced in mobile vehicles which scanned up and down container yards for real-time inventory. This introduced both a new technology and a business redesign, resulting in more accurate and timely data.
The research issues we are working on are: (a) analyzing direct entry technologies, such as mobile computing technologies, neural network techniques for handwriting analysis, and portable communicating terminals, (b) studying connectivity among information systems, (c) representing and automatically using knowledge about the semantics of the data, and (d) creating new paradigms for system design that incorporate data quality tags such as for time and source.

The TDQM program is a joint effort between members of the MIT Information Technology Group, industry partners, and related industry-specific research programs at MIT (including the Leaders for Manufacturing, the International Financial Services Research Center, and the Center for Transportation Studies. Ultimately, TDQM expects to draw sponsors from a wide range of industries, including finance, transportation, manufacturing, and telecommunications. Fujitsu Personal Systems, Inc. and Bull HN Information Systems, Inc. joined as founding sponsors in 1991.
Mr. Daryll Wartluft, then Vice President of Bull Information Systems, explained why Bull is sponsoring our work as follows: "the TDQM effort is an important part of our relationship with MIT and ties in well with Bull's worldwide commitment both to our own internal quality and to assisting our customers in better understanding and improving their information quality."
Mr. Lou Panetta, then Executive Vice President of Marketing and Sales at Fujitsu Personal Systems put it this way. "Total data quality management of today's corporations requires 100% accurate data input from the field sales, service, and support organizations. Erroneous data entering the organization compounds itself as decisions are made based on inaccurate information. Accurate and up-to-date information can help companies service their customers better, thereby giving them a competitive edge."
Dr. Robb Wilmot, then Chairman of Fujitsu Personal Systems agrees: "The prototypical chairman's statement in an annual report mentions 100% customer satisfaction by the third paragraph -- if not sooner -- with no explicit recognition in the enterprise that the data with which this goal is to be achieved is typically between 5% to 10% defective. If you analyze the supply chain costs in a large company, it is not uncommon to find that half of the total cost is rework caused by defective data -- with untold competitive costs."
Cooperation with Sponsors
In order to establish a center for excellence on TDQM, we are interested in collecting cases of data quality projects including completed projects so that we can document the lessons learned from the project and quantify business impacts of these projects. In addition, we are interested in analyzing how various companies approach their respective data quality projects, both in data quality enhancement and data quality control. We are interested in pursuing various other joint activities with sponsors on this project. Organizations with an interest in our research agenda are invited to become sponsoring members of the TDQM Project. Project members can: (a) serve as study or test sites for the TDQM research activities, (b) attend special TDQM symposiums at MIT, (c) involve personnel in TDQM research efforts, (d) receive TDQM working papers, (e) contribute in general, through participation and sponsorship, to the development and advancement of TDQM.

Concluding Remarks
Data are used to support most activities in modern organizations, be they operational, managerial, or strategic in nature. If these data are defective, there are many ways that poor data quality can affect organizational effectiveness and efficiency. Without a systematic and comprehensive way to conceptualize and address the data quality issue, organizations are left to grapple with this problem in an ad hoc, and piece meal manner. The TDQM effort aims to construct a paradigm for data quality management, to serve as a center for excellence in managerial and technology practice, as well as to develop a rigorous foundation and discipline for data quality to extend into the future.