KEY WORDS: Data Quality, Database Systems, Data Administration
Acknowledgments Work reported herein has been supported,
in part, by MIT's Total Data Quality Management (TDQM) Research
Program, MIT's International Financial Service Research Center
(IFSRC), Fujitsu Personal Systems, Inc. and Bull-HN.
ABSTRACT Quality requirements have been largely
overlooked in previous database research. This paper presents
a quality-entity-relationship (QER) approach for incorporating
quality requirements (database quality data and product quality
data) of an application, into conceptual database design. The
underlying premise of this research is that quality requirements
should be distinct from application requirements. The QER approach
extends traditional conceptual design methodologies and leads
to a new direction for further investigation. Practitioners can
benefit from this research because they can incorporate the research
results directly into current database design practices. Moreover,
the QER conceptual design output can be directly translated into
a relational schema that can be implemented in existing database
management systems.
The quality of data in a database management system (DBMS) has been treated through techniques such as concurrency, recovery, integrity, and security control [Codd, 1970; Chen, 1976; Codd, 1979; Bernstein & Goodman, 1981; Fernandez, Summers, & Wood, 1981; Ullman, 1982] . Although these techniques are necessary to ensure the correct operation of a DBMS, they were not developed to address issues related to quality that are important in decision making.
From the user's viewpoint, there are two types of quality data items that need to be captured in a database design: (1) database quality data and (2) product quality data (referred to as quality requirements hereafter). Database quality data captures dimensions such accuracy, timeliness, completeness, and consistency of data in the database. Product quality data reflects the quality of products such as car trouble spots, hotel ratings and customer satisfaction. Existing database design methodologies, however, focus primarily on identifying, capturing, and representing the data items of an application. A database that incorporates quality components would be much more useful and, thus, database design methodologies should be extended to model quality requirements.
Much attention has been paid to the importance of quality in manufacturing. It is recognized that the earlier quality is considered in the production cycle, the less costly in the long run because upstream defects cause downstream inspection, rework, and rejects [Taguchi, 1979; Deming, 1982; Ishikawa, 1985; Deming, 1986; Juran, 1989; Juran, 1992] . The lesson to database design is to design quality into the database. The purpose of this research, therefore, is to extend traditional database design methodologies to incorporate database quality data and product quality data, in addition to database application data.
The objectives of this research are: (1) to develop a conceptual framework for database designers to separate conventional application requirements (referred to as product requirements hereafter) from product quality requirements and data quality requirements, in addition to emphasizing the importance of quality in database design and use, (2) to define a set of constructs fundamental to modeling product quality and data quality requirements, and (3) to analyze various cases that may arise due to different combinations of these constructs, which will help database designers in modeling user quality requirements in conceptual design.
For the purposes of this research, we adopt the following naming convention for the following entity-relationship constructs:
attribute lower case, hyphenate words
"attribute name" double quote to denote the name of an attribute
'attribute value' single quote to denote the value of an attribute
Entity italic, upper case for the first letter of each word
Gerund-Entity italic, upper case for the first letter of each word, hyphenate words
Primary-Key upper case for the first letter of each word, underline the phrase
verb phrase of italic, lower case, hyphenate words
a relationship
For exposition purposes, we present an application scenario below. Consider a car rental database that keeps track of the rental cars in its fleet and the customers who rent them. A rental car has a car type, which is categorized based upon its make, model, and year. An entity-relationship model [Chen, 1976; Teorey, Yang, & Fry, 1986] for this application is shown in Figure 1 and its corresponding conceptual design follows.

Entities:
Car Type: [Make, Model, Year, #sold]
Rental Car: [Licence#]
Rental History: [Licence#, Date-Returned, #miles]
Relationships:
Rental Car has Car Type
[1,1] [0,N]
The min/max cardinalities [Tsichritzis & Lochovsky, 1982] indicate that, for each occurrence of the Rental Car entity, there is one and only one corresponding Car Type. Each occurrence of the Car Type entity, however, can have zero to many (N) corresponding Rental Cars.
Rental Car has Rental History
[0,N] [1,1]
This conceptual model represents the current approach to database design where data items such as Car Type and Customer are captured. Although these types of requirements are essential for the operation of a database management system, they are not sufficient to ensure that quality requirements will be fulfilled. We illustrate this through an analogy to product manufacturing.
In designing a car, it is necessary to ensure that a customer will be able to drive the car from point a to point b. However, a consumer does not purchase a car simply because it can be driven from one location to another. Other requirements, such as safety features, that are quality-oriented play at least an equally important role in the consumer's overall assessment of the car.
Similarly, in the rental car example, many quality-oriented data items are not captured, such as car trouble spots and car ratings. Another component that is not captured is data quality measures such as the accuracy, timeliness, and completeness of all these data. Although some of these quality requirements may have been incorporated by astute database designers in their practices, existing research on database design does not explicitly model these quality requirements.
The remainder of paper is organized as follows. The fundamental assumptions and the Chinese Wall Principle upon which our quality database design methodology is based are presented in Section 2. Section 3 defines the Quality Entity Relationship (QER) constructs and analyzes various cases that may arise due to different combinations of these constructs. Section 4 applies these results to an application in which a given set of quality requirements are first translated into a conceptual design, and then into a relational design to demonstrate the practicality of this research. Section 5 summarizes and concludes the paper.
Our objective is to develop a methodology for incorporating quality requirements as an integral part of conceptual database design. To do so, we first present the fundamental assumptions upon which our research is based, and then a design principle, which we refer to as The Chinese Wall Principle. Based on these fundamental assumptions and this principle, we define, in Section 3, the Quality Entity-Relationship (QER) constructs and analyze various cases of quality relationships that arise due to different cardinalities of the participating entities.
This research is based on some assumptions about the need to incorporate quality requirements into database design and the nature of these quality requirements. These assumptions are fundamental to this research, and therefore, are stated explicitly.
·Assumption 1Ò Quality requirements need to be identified as part of user requirements, and incorporated as an integral part of conceptual design.
Application requirements typically deal with data items such as employee rank and salary, or department name and location, that are fundamental to a database application. Quality requirements for data quality (e.g., credibility, timeliness, and completeness) and product quality (e.g., trouble spots, six sigma conformance, and customer satisfaction ratings) become important after application requirements have been identified. These quality requirements need to be explicitly incorporated as an integral part of conceptual design in order to ensure that quality requirements are built into the database system.
The identification of quality requirements must begin at the requirements analysis stage because they are application and user-dependent. In this research, however, we assume that the requirements analysis phase has been completed, and the quality requirements given.
·Assumption 2Ò Meta-quality of the data in the database can be modeled through data quality constructs developed in this research.
Under certain circumstances, such as military applications, the user may want to know the quality of the quality (meta-quality) of the data in the DBMS. An attribute-based model was proposed to incorporate meta-quality up to any number of levels depending on the users' requirements [Wang, Reddy, & Kon, 1992] . Meta-quality can also be modeled in conceptual database design. To focus on key issues related to quality requirements, we will not further discuss meta-quality in this paper.
·Assumption 3Ò Database designers may not always be able to distinguish categorically quality requirements from application requirements.
Many user requirements could be categorized as application oriented, or as product quality oriented. For example, the balance of a bank account is clearly application oriented. It is less clear, however, to determine if the amount of time taken to perform a transaction is application or quality oriented. It may be considered a product requirement for a customer; alternatively, it may be modeled as a quality requirement for the bank. Under such circumstances, database designers must work with users to determine which requirements are product, as opposed to product-quality, oriented.
Assumption 1 states that quality requirements, which include product quality requirements and database data quality requirements, need to be incorporated as an integral part of a conceptual design. Since the conceptual design process for product requirements is well understood [Ullman, 1982; Korth & Silberschatz, 1986; Date, 1990; Teorey, 1990; Elmasri & Navathe, 1994] , it is a logical foundation upon which to model quality requirements, hence we propose product requirements modeling as the first step (see Figure 2) in a quality database design methodology.

The second step is to model product quality requirements. By Assumption 3, it may not always be easy to distinguish quality requirements from application requirements. Therefore, as in conventional database design and software engineering practices, these two steps may need to be iterated until a conceptual design is produced, in which the part that represents product requirements is separated from that which represents product quality requirements.
Since both product requirements and product quality requirements will ultimately be captured and stored as data in a DBMS, it is necessary to ensure that a mechanism is available to measure the quality of the data in the DBMS. This is the task involved in Step 3.
We summarize the preceding analysis in the following quality de-coupling principle, which we refer to as the Chinese Wall Principle to emphasize the need for the separation of these requirements from the functional decomposition perspective:
·The Chinese Wall PrincipleÒ Product requirements should be modeled first, product quality requirements second, and data quality requirements third, although this may be an iterative process. These three types of requirements should be separated.
As shown in Figure 2, this principle emphasizes that there are three different kinds of requirements: product requirements, product quality requirements, and data quality requirements. The arrow from Step 1 to Step 2 signifies that product requirements modeling precedes product quality requirements modeling. Similarly, the arrows from Steps 1-2 to Step 3 signify that product and product quality requirements need to be modeled before data quality requirements. This principle still holds in those situations where Step 2 is not required. We elaborate on product requirements, product quality requirements, and data quality requirements below.
Product requirements correspond to traditional application requirements. In general, product requirements can be classified as manufactured products, such as cars, service products, such as hotel accommodation, and information products, such as financial databases.
Product quality requirements correspond to the quality aspect of a product, as described in the quality literature [Crosby, 1979; Juran, 1979; Taguchi, 1979; Juran & Gryna, 1980; Taguchi, 1981; Deming, 1982; Garvin, 1983; Crosby, 1984; Cullen & Hollingham, 1987; Garvin, 1987; Garvin, 1988; Hauser & Clausing, 1988; Juran, 1989; ISO, 1992; Juran, 1992]. These requirements may arise at the input, process, and output stages associated with the production of a product. Such requirements include measurements against standards and comparisons of final products to a set of initial design specifications.
Data quality requirements correspond to the quality of data in the database that represent both the product and product quality requirements in the real world. In general, data in a database may be of poor quality because it does not reflect real world conditions [Liepens, Garfinkel, & Kunnathur, 1982; Morey, 1982; Laudon, 1986; Oman & Ayers, 1988; Liepins, 1989; Liepins & Uppuluri, 1990; Redman, 1992; Wang & Kon, 1993] . It is also well-accepted that data quality is a multi-dimensional concept, including dimensions such as accuracy, completeness, timeliness, and consistency [Yu & Neter, 1973; Cushing, 1974; Bodnar, 1975; Kriebel, 1979; Johnson, Leitch, & Neter, 1981; Morey, 1982; Bailey & Pearson, 1983; Ives, Olson, & Baroudi, 1983; Knechel, 1983; Ballou & Pazer, 1985b; Knechel, 1985; Ballou & Pazer, 1987; Oman & Ayers, 1988; Delone & McLean, 1992; Redman, 1992; Wang, Reddy, & Kon, 1992; Wang, Kon, & Madnick, 1993; Wang, Strong, & Guarascio, 1993] . Identifying which data quality dimensions are important for a particular product and its corresponding product quality is a task that must be carried out by the design team (database designers and users).
We have presented the fundamental assumptions that underlie this research and the Chinese Wall Principle that help to guide the development of a quality requirement design methodology. The next section defines Quality Entity-Relationship constructs and analyzes various cases that arise due to different combinations of these constructs.
Different approaches may be employed to modeling quality requirements. For example, a data quality requirements analysis and modeling approach was proposed [Wang, Kon, & Madnick, 1993] in which quality requirements were captured through meta-attributes. This approach was further extended [Tu & Wang, 1993; Wang, Reddy, & Kon, 1994] by modeling meta-attributes through an extension to the relational model [Codd, 1970; Codd, 1979; Codd, 1990] , and an attribute-based model proposed. However, the attribute-based approach has two main weaknesses. First, it captures quality requirements primarily at the logical level. Second, it requires a significant modification of existing relational DBMS's in order to incorporate quality requirements.
In contrast, this research is designed to build upon the Entity-Relationship (ER) model [Chen, 1976; Teorey, Yang, & Fry, 1986] in order to exploit existing DBMS technologies. Through the use of Quality Entity-Relationship (QER) constructs and two representations derived from the gerund entity type (defined below), this research models meta-attributes based on the existing ER constructs, and therefore, all the quality requirements, once modeled conceptually, can be implemented in existing relational DBMS's.
In the entity-relationship model, an entity is a concrete object (e.g., a person, an animal, or a chair) or a conceptual object (e.g., a set, a theory, or a project) that can be distinctly identified. A relationship is an association among entities. Although a relationship can only exist among entities, it can also be represented as a gerund-type entity [Chen, 1993] . An attribute is a function that represents a property of an entity or relationship.
To capture quality requirements in a conceptual design, it is crucial to model how entities, relationships, and attributes among products, product quality, and data quality associate with one another. To provide an analytical foundation for doing so, we introduce the notions of quality entity and quality relationship below. For clarity, we refer to the entities and relationships that capture product requirements in conceptual design as application entities and application relationships hereafter.
Following the entity-relationship model, we define a quality entity as an entity whose corresponding semantics reflect a quality aspect of an application entity. A quality relationship is defined as an association between a quality entity and one or more application or quality entities. An attribute attached to a quality entity or quality relationship is referred to as a quality attribute. Quality entities, quality relationships, and quality attributes are all created based on the quality requirements given by the user. Quality entities that define product quality requirements are referred to as product quality entities, whereas those that define database quality data are referred to as data quality entities.
A product quality entity captures a quality aspect of an application entity. For example, the product quality entity Trouble Spot can be used to represent information on the various components of a car, upon which its quality is rated. The primary key of Trouble Spot could be 'TS-Name' with values such as 'engine', 'wheel', 'air-conditioner'. A related product quality entity is TS Rating with primary key attribute "Rating" and values such as 'above average', 'average', and 'below average' and a textual non-key attribute "description." Other product quality entities include Defective-Rate, Customer-Satisfaction and Car-Recall.
Data Quality Dimensions and Data Quality Measures are two important kinds of data quality entities. A data-quality-dimensions entity captures dimensions such as accuracy, timeliness, completeness, and consistency [Ballou & Pazer, 1985a] . It may have data-quality-dimension names as its primary key attribute, with values such as 'timeliness', 'completeness', and 'accuracy' (with no other attributes). A degenerative case of the above occurs when the user is interested only in one data quality dimension, say, accuracy. Another example occurs when the user always wants to know information on timeliness and accuracy. In this case, the data-quality-dimension entity could have attributes "accuracy" and "timeliness" (which is a compound primary key).
A data-quality-measure entity describes a measurement instrument for data quality. It may have 'Rating' as its primary key attribute, with values such as 'perfect', 'excellent', 'good', and 'poor'; and with 'description' as a textual non-key attribute describing what the rating values mean. Alternatively, a data-quality-measure entity may have heterogeneous measurement scales such as 'yes' or 'no', or a percentage between 0 and 100.
A product quality relationship associates a product entity with a product quality entity. For example, a consumer might be interested in knowing what kind of trouble spots a particular type of car might have. This can be modeled by the quality relationship Car Type has Trouble Spot (see Figure 3). To avoid redundancy, TS Rating is modeled as a separate quality entity from Trouble Spot and then associated with the quality relationship. However, a relationship cannot be directly associated with an entity in the entity-relationship model. Therefore, we first convert the quality relationship to a gerund, called Car-Type-Trouble-Spot, and then create a relationship between the gerund and TS Rating. Since the relationship between Car Type and Trouble Spot is many-to-many, the primary key of the gerund is the concatenation of the primary keys of Car Type and Trouble Spot.
This representation can be applied to a situation where a product entity is associated with a quality product entity that can have various categories, each of which has a corresponding rating scale and interpretation. We refer to this as the Entity Gerund Representation.

A data quality entity may be created to keep track of data quality for attributes of application entities and product quality entities. For example, the application entity Rental Car has "mileage" as an attribute (see Figure 4) for which the user might be interested in its data quality ratings. However, there is no direct mechanism in the entity-relationship model to associate an attribute (e.g., the "mileage" attribute) of one entity (e.g., Rental Car) with another entity (e.g., data quality entities such as Data Quality Dimension and Data Quality Measure). Moreover, "mileage" is an application attribute and by the Chinese Wall Principle, an application requirement must remain part of the application specification, and not become an attribute of a quality relationship or entity.
To resolve this problem, we first create a relationship, Rental Car mileage-data-has DQ Dimension, which has mileage as part of the name of the relationship. We then represent this quality relationship as a gerund entity, which in turn, is associated with the data-quality-measure entity, thus allowing for the retrieval of the description for the data quality dimension values.
This representation can be applied to a situation where a product entity attribute needs to be associated directly with a data quality entity that has various quality categories, each of which has a corresponding rating scale and interpretation. We refer to this as the Attribute Gerund Representation.

We have defined the main constructs necessary for defining quality entities, quality relationships, and quality attributes. The following subsections analyze the complexities that may arise when these constructs are applied to an application.
Drawing upon the Chinese Wall Principle, we analyze, in this section, how a quality relationship can be incorporated into a conceptual design. There are three categories of quality relationships among product entities, quality entities, and data quality entities: (1) a product entity to a product quality entity, (2) a product entity to a data quality entity, and (3) a product quality entity to a data quality entity.
Because it is the cardinalities of the participating entities that determine how a quality relationship is defined in the conceptual model (instead of which categories of entities are involved) and because the Chinese Wall Principle applies equally to each of these three categories, it does not matter which category we analyze. For convenience, we present our analysis in the context of the first category, i.e., possible quality relationships between a product entity and a product quality entity (using Car Type and Trouble Spot). There are four mutually exclusive and collectively exhaustive cases of cardinalities:
Case 1: [0,N] & [0,N]
Case 2: [1,1] & [0,N]
Case 3: [0,N] & [1,1]
Case 4: [1,1] & [1,1]
This is the most general case which can be represented in various ways. We first represent it using the entity gerund representation, and then discuss four alternative representations. The entity gerund representation is advocated in this case because it, not only provides a very clear separation of the product entities from their product quality entities, but also represents visually the connections among all of the entities.
Case 1.1: Entity-Gerund Representation
To associate a trouble spot rating with each trouble spot of a given car type in which both entities Car Type and Trouble Spot have [0,N] cardinalities, apply the entity gerund representation to produce the gerund Car-Type-Trouble-Spot. This is exemplified in Figure 5 and the corresponding conceptual design shown below it.

Car Type: [Make, Model, Year, #sold]
Trouble Spot: [TS-Name, inspection-method]
TS Rating: [Scale, interpretation]
Car-Type-Trouble-Spot: [Model, Make, Year, TS-Name]
Car Type has Trouble Spot
[0,N] [0,N]
Car-Type-Trouble-Spot has TS Rating
[1,1] [0,N]
Case 1.2: Ternary Relationship Representation
The association among the entities Car Type, Trouble Spot, and TS Rating could be represented as a ternary relationship instead (see Figure 6). The key of the relationship relation that represents this ternary relationship would be the concatenation of the keys of Car Type and Trouble Spot; it would be over-specified if the key of TS Rating were also included [Teorey et al., 1986].

Case 1.3: Relationship Attribute Representation
Since this is a many-to-many relationship, another option, following the normal database design rules [Teorey et al., 1986], is to make "ts-rating" an attribute of the relationship Car Type has Trouble Spot. However, this option does not model the descriptive , interpretative, information that might be needed for the ratings. Therefore, it is practical only if the numerical value of the trouble spot rating of a given car type for a given trouble spot is self-explanatory.
Case 1.4: Solo Entity Representation
TS Rating could be treated as a separate entity with descriptive information as an attribute. It would then become a solo entity and would appear in a relational model as a look-up table. Although not violating the Chinese Wall Principle, this alternative does not make explicit the association between a given trouble spot of a given car type and TS Rating.
Case 1.5: Non-key Attribute Representation
As a final alternative, consider the following representation in which each of the ratings is treated as a non-key attribute of the product entity Car Type, i.e.,
Car Type: [Make, Model, Year, #sold, ts-engine-rating, ts-brakes-rating, ts-wheel-rating, ts-locking-system-rating, ....]
This alternative, however, does not allow for descriptive information on the ratings, nor does it comply with the Chinese Wall Principle.
The worst trouble spot for a given car type can be represented by:
Car Type has Worst Trouble Spot
[1,1] [0,N]
This relationship would be represented, in the relational model, by making the key of Worst Trouble Spot a foreign key in Car Type. However, doing so would violate the Chinese Wall Principle which emphasizes the separation of the data pertaining to product entities from that pertaining to product quality entities. Again, the entity gerund representation is chosen. We first create a gerund entity Car-Type-Worst-Trouble-Spot to represent the relationship Car Type has Worst Trouble Spot, and then represents its relationship with TS Rating as:
Car-Type-Worst-Trouble-Spot has TS Rating
[1,1] [0,N]
The primary key of this gerund entity Car-Type-Worst-Trouble-Spot will be simply the key of Car Type. We do not need the concatenation of the primary key of Car Type with that of Trouble Spot because the primary key of Car Type can provide the unique identification. The gerund entity has a non-key attribute "worst-ts-name" (the key of Worst Trouble Spot). This is exemplified in Figure 7 and the corresponding conceptual design that follows.

Car Type: [Make, Model, Year, #sold]
Worst Trouble Spot: [TS-Name, inspection-method]
TS Rating: [Scale, interpretation]
Car-Type-Worst-Trouble-Spot: [Model, Make, Year, ts-name]
Car Type has Worst Trouble Spot
[1,1] [0,N]
Car-Type-Worst-Trouble-Spot has TS Rating
[1,1] [0,N]
In contrast to Case 2, the [0,N] & [1,1] cardinalities do not result in a violation of the Chinese Wall Principle. Consider the following situation. For each trouble spot (e.g., engine, brakes), there is one car type that is the best in avoiding this trouble. Thus, there can be only one car type judged the best in each trouble spot category, whereas a given car type could be the best in none or up to many categories, hence the following relationship:
Car Type has Best in Trouble Spot
[0,N] [1,1]
This relationship is represented in a relational model by making the key of Car Type a foreign key in Best in Trouble Spot. Doing so does not violate the Chinese Wall Principle, and therefore, the relationship is represented in this manner, as shown in Figure 8 and the corresponding conceptual design that follows.

Car Type: [Make, Model, Year, #sold]
Best in Trouble Spot: [TS-Name, inspection-method]
TS Rating: [Scale, interpretation]
Car Type has Best in Trouble Spot
[0,N] [1,1]
Best in Trouble Spot has TS Rating
[1,1] [0,N]
Suppose that we wish to model a situation where there is an overall ranking assigned to each car type and there can be no ties. Consider the relationship:
Car Type has Overall Ranking
[1,1] [1,1]
In this case, we do not need a separate entity for Rating because "interpretation" is an attribute of the Overall Ranking (and no redundancy occurs). This example is shown in Figure 9 and the corresponding conceptual design that follows:

Car Type: [Make, Model, Year, #sold]
Overall Ranking: [Scale, interpretation]
Car Type has Overall Ranking
[1,1] [1,1]
We have represented the quality relationships for four mutually exclusive and collective exhaustive cases between entities, separated by the Chinese Wall Principle, based on the cardinalities of their relationships. There are other combinations of the QER constructs that can be analyzed, although they are less likely to occur in practice. For example, a further analysis of possible combinations between quality relationships and entities (product and product quality entities) is found in the Appendix. The next section synthesizes the concepts and research findings in an application example.
This section illustrates how quality requirements can be modeled using the research results presented in the previous sections. In doing so, we demonstrate that the conceptual design output produced by the quality-entity-relationship (QER) approach can be translated directly into a relational schema, and implemented using existing DBMS's. (Therefore, this research can be used by database practitioners in their current assignments to incorporate quality requirements into a database design.)
We synthesize and extend the running application scenario presented in the previous sections as follows: A rental car has a rental history, which includes the number of miles it was driven during each rental contract. For a given car type, it is interesting to know what trouble spots may exist, thus, ratings of trouble spots are needed, along with an interpretation of these ratings. In addition, to assure that the data are of high quality, the accuracy and timeliness information of the mileage data and the accuracy information of the scale for trouble spot ratings are needed.
A quality-entity-relationship representation of the rental car database is shown in Figure 12. The steps involved in the development of the quality-entity-relationship representation are outlined in Figure 13. In the remainder of this section, we show how this application scenario can first be translated into a set of user requirements, then a quality-entity-relationship conceptual design, and finally a relational schema.
From the above application scenario, a database designer can interact with the user to distill the user requirements, as shown in Step 1 of Figure 13. As presented in the previous section, three product entities are produced from Step 2: Rental Car, Car Type, and Rental History. Step 3 identifies Trouble Spot and TS Rating as the product quality entities. Since Trouble Spot describes the quality of Car Type, the user involved in the design team would categorize it as a quality entity. The TS Rating is also a description of a quality aspect. Therefore, both Trouble Spot and TS Rating fall on the top right side of the Chinese Wall.

In Step 4, the design team would recognize that both the attribute "mileage" of Rental Car and the attribute "scale" of TS Rating, require information on their accuracy. In addition, "mileage" needs information on timeliness. Concepts such as accuracy and timeliness relate to the quality of the data in the database and, therefore, are represented by data quality entities.
Steps 5-7 model quality relationships using entity gerund representations. Note that the attribute gerund representation is applied to associate the product quality attribute "scale" with the data quality entity DQ Dimension. Finally, Steps 8-10 list all the product relationships, product quality relationships, and data quality relationships.
Note that each of the three categories of quality relationships has been illustrated in this example: (1) product entity and product quality entity; (2) product entity and data quality entity; and (3) product quality entity and data quality entity.
Every entity becomes a separate entity relation. A relationship is represented either by a foreign key or a separate relationship relation. The rules that are followed to do so are the normal ones for database design. That is, when the min/max cardinalities of one entity are [1,1], represent the relationship in a relational model by making the key of the other entity a foreign key in the entity relation of the entity with the [1,1] cardinality. Teorey et al. [1986] call this an extended entity relation. When neither entities have [1,1] cardinalities, represent the relationship as a separate relationship relation whose key is the concatenation of the keys of the involved entities and the non-keys are the relationship attributes [Teorey, Yang, & Fry, 1986; Storey & Goldstein, 1988] .
The relationship from which a gerund entity is created is inherently represented. This is because the relationship is first converted to an entity. In the case of a many-to-many relationship, the primary key of the gerund entity is the concatenation of the keys of the involved entities, which is exactly what the primary key of the corresponding relationship relation would be. The entity is then represented by an entity relation in the relational model. Both the entity relation and relationship relation would be exactly the same: they would include the concatenation of the two primary keys and there would be no non-key attributes. Of course, a foreign key may be added to this relation to represent another relationship as is the case when representing the relationship Car-Type-Trouble-Spot has TS Rating. The corresponding relational model is shown in Figure 14.
The output from each step is shown directly below it.
Step 1: Identify user requirements
What rental cars exist, their mileage and rates.
Accuracy and timeliness measures of a rental car's mileage, along with a description of what the accuracy and timeliness values mean.
The car type of a rental car, including the number sold of a given car type.
The rental history of a car, in particular, the number of miles driven each time the car was rented.
The types of trouble spots a car type can have, how they are rated, their rating scales and an interpretation of these rating scales.
For each rating of a trouble spot, a measure of its accuracy is needed and an interpretation of its accuracy value.
Step 2: Identify product entities:
Car Type: [Make, Model, Year, #sold]
Rental Car: [Licence#, mileage, rate]
Rental History: [Licence#, Date-Rented, #miles]
Step 3: Identify corresponding product quality entities:
Trouble Spot: [TS-Name, inspection-method]
TS Rating: [Scale, interpretation]
Step 4: Identify data quality entities:
DQ Dimension: [D-Name, value]
DQ Measure: [Value, description]
Step 5: Associate product entities with product quality entities:
Car-Type-Trouble-Spot: [Make, Model, Year, TS-Name]
Step 6: Associate product attributes with data quality entities:
Rental-Car-mileage-DQ-Dimension: [Licence#, D-Name]
Step 7: Associate product quality attributes with data quality entities:
TS Rating-scale-DQ-Dimension: [Scale, D-Name]
Step 8: List the product relationships:
Rental Car has Rental History
[0,N] [1,1]
Rental Car has Car Type
[1,1] [0,N]
Step 9: List the product quality relationships:
Car Type has Trouble Spot
[0,N] [0,N]
Car-Type-Trouble-Spot has TS Rating
[1,1] [0,N]
Step 10: List the data quality relationships:
Rental Car mileage-data-has DQ Dimension
[1,N] [1,N]
Rental-Car-mileage-DQ-Dimension has DQ Measure
[1,1] [0,N]
TS Rating scale-data-has DQ Dimension
[0,N] [0,N]
TS Rating-scale-DQ-Dimension has DQ Measure
[1,1] [0,N]
Entity Relations:
Car Type: [Make, Model, Year, #sold]
Rental History: [Licence#, Date-Rented, #miles]
Trouble Spot: [TS-Name, inspection-method]
TS Rating: [Scale, interpretation]
DQ Dimension: [D-Name, value]
DQ Measure: [Value, description]
Extended Entity Relations:
Rental Car: [Licence#, mileage, rate, make, model, year]
Car-Type-Trouble-Spot: [Make, Model, Year, TS-Name, scale]
Rental-Car-mileage-DQ-Dimension: [Licence#, D-Name, value]
TS Rating-scale-DQ-Dimension: [Scale, D-Name, value]
We have presented a quality-entity-relationship (QER) approach for incorporating quality requirements (database quality data and product quality data) of an application, into conceptual database design. The underlying premise of this research is that quality requirements should be distinct from application requirements. For this, we introduced the Chinese Wall Principle to ensure the separation of product entities, product quality entities, and data quality entities. Quality entity, quality relationship, and quality attribute constructs were introduced to extend the traditional entity-relationship model. Based on these constructs, an analysis was performed to identify the various representations that could be used by database designers when modeling quality requirements. To illustrate and synthesize the research results, an application example was also presented.
Several new research areas could be further explored: (1) an analysis of integrity constraints within the context of quality entity-relationships and the development of a QER algebra that are fundamental for a QER model; (2) an object-oriented approach to modeling quality requirements; and (3) field studies of the QER approach in industrial applications to validate the research results.
In summary, quality requirements have been largely overlooked in previous database research. The QER approach presented in this research has extended traditional conceptual design methodologies and led to a new direction for further investigation. Moreover, it benefits practitioners because this approach can be incorporated directly into current database design practices, and the QER conceptual design output can be directly translated into a relational schema that can be implemented in existing DBMS's.
[1] Bailey, J. E. & Pearson, S. W. (1983). Development of a Tool for Measuring and Analyzing Computer User Satisfaction. Management Science, 29(5), 530-545.
[2] Ballou, D. P. & Pazer, H. L. (1985a). Modeling Data and Process Quality in Multi-input, Multi-output Information Systems. Management Science, 31(2), 150-162.
[3] Ballou, D. P. & Pazer, H. L. (1985b). Process improvement versus enhanced inspection in optimized systems. International Journal of Production Research, 23(6), 1233-1245.
[4] Ballou, D. P. & Pazer, H. L. (1987). Cost/Quality Tradeoffs for Control Procedures in Information Systems. OMEGA: International Journal of Management Science, 15(6), 509-521.
[5] Bernstein, P. A. & Goodman, N. (1981). Concurrency Control in Distributed Database Systems. Computing Surveys, 13(2), 185-221.
[6] Bodnar, G. (1975). Reliability Modeling of Internal Control Systems. The Accounting Review, 50(4), 747-757.
[7] Chen, P. P. (1976). The Entity-Relationship Model - Toward a Unified View of Data. ACM Transactions on Database Systems, 1, 166-193.
[8] Chen, P. S. (1993). The Entity-Relationship Approach. In Information Technology in Action: Trends and Perspectives. (pp. 13-36). Englewood Cliffs: Prentice Hall.
[9] Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377-387.
[10] Codd, E. F. (1979). Extending the relational database model to capture more meaning. ACM Transactions on Database Systems, 4(4), 397-434.
[11] Codd, E. F. (1990). The relational model for database management: version 2. Reading: Addison-Wesley.
[12] Crosby, P. B. (1979). Quality is Free. New York: McGraw-Hill.
[13] Crosby, P. B. (1984). Quality Without Tears. New York: McGraw-Hill Book Company.
[14] Cullen, J. & Hollingham (1987). Implementing Total Quality. London: IFS (Publications) Ltd.
[15] Cushing, B. E. (1974). A Mathematical Approach to the Analysis and Design of Internal Control Systems. Accounting Review, 49(1), 24-41.
[16] Date, C. J. (1990). An Introduction to Database Systems. Reading: Addison-Wesley.
[17] Delone, W. H. & McLean, E. R. (1992). Information Systems Success: The Quest for the Dependent Variable. Information Systems Research, 3(1), 60-95.
[18] Deming, E. W. (1986). Out of the Crisis. Cambridge: Center for Advanced Engineering Study, Massachusetts Institute of Technology.
[19] Deming, W. E. (1982). Quality, Productivity, and Competitive Position. Cambridge: MIT Center for Advanced Engineering Study.
[20] Elmasri, R. & Navathe, S. (1994). Fundamentals of Database Systems. Reading, MA: The Benjamin/Cummings Publishing Co., Inc.
[21] Fernandez, E. B., Summers, R. C., & Wood, C. (1981). Database Security and Integrity. Readings: Addison-Wesley.
[22] Garvin, D. A. (1983). Quality on the line. Harvard Business Review, 61(5), 65-75.
[23] Garvin, D. A. (1987). Competing on the eight dimensions of quality. Harvard Business Review, 65(6), 101-109.
[24] Garvin, D. A. (1988). Managing Quality-The Strategic and Competitive Edge. New York: The Free Press.
[25] Hauser, J. R. & Clausing, D. (1988). The House of Quality. Harvard Business Review, 66(3), 63-73.
[26] Ishikawa, K. (1985). What is Total Quality Control?-the Japanese Way. Englewood Cliffs: Prentice-Hall.
[27] ISO (1992). ISO9000 International Standards for Quality Management. (No. International Standard Organization, Geneva.
[28] Ives, B., Olson, M. H., & Baroudi, J. J. (1983). The Measurement of User Information Satisfaction. Communications of the ACM, 26(10), 785-793.
[29] Johnson, J. R., Leitch, R. A., & Neter, J. (1981). Characteristics of Errors in Accounts Receivable and Inventory Audits. Accounting Review, 56(2), 270-293.
[30] Juran, J. M. (1979). Quality Control Handbook. New York: McGraw-Hill Book Co.
[31] Juran, J. M. (1989). Juran on Leadership for Quality: An Executive Handbook. New York: The Free Press.
[32] Juran, J. M. (1992). Juran on Quality by Design: The New Steps for Planning Quality into Goods and Services. New York: Free Press.
[33] Juran, J. M. & Gryna, F. M. (1980). Quality Planning and Analysis. New York: McGraw Hill.
[34] Knechel, W. R. (1983). The Use of Quantitative Models in the Review and Evaluation of Internal Control: A Survey and Review. Journal of Accounting Literature, 2, 205-219.
[35] Knechel, W. R. (1985). A Simulation Model for Evaluating Accounting Systems Reliability. Auditing: A Journal of Theory and Practice, 4(2), 38-62.
[36] Korth, H. & Silberschatz, A. (1986). Database System Concepts. New York: McGraw-Hill Book Company.
[37] Kriebel, C. H. (1979). Evaluating the Quality of Information Systems. In Design and Implementation of Computer Based Information Systems. (pp. 29-43). Germantown: Sijthtoff & Noordhoff.
[38] Laudon, K. C. (1986). Data Quality and Due Process in Large Interorganizational Record Systems. Communications of the ACM, 29(1), 4-11.
[39] Liepens, G. E., Garfinkel, R. S., & Kunnathur, A. S. (1982). Error localization for erroneous data: A survey. TIMS/Studies in the Management Science, 19, 205-219.
[40] Liepins, G. E. (1989). Sound Data Are a Sound Investment. Quality Progress, 22(9), 61-64.
[41] Liepins, G. E. & Uppuluri, V. R. R. (Ed.). (1990). Data Quality Control: Theory and Pragmatics. New York: Marcel Dekker, Inc.
[42] Morey, R. C. (1982). Estimating and Improving the Quality of Information in the MIS. Communications of the ACM, 25(5), 337-342.
[43] Oman, R. C. & Ayers, T. B. (1988). Improving Data Quality. Journal of Systems Management, 39(5), 31-35.
[44] Redman, T. C. (1992). Data Quality: Management and Technology. New York: Bantam Books.
[45] Storey, V. & Goldstein, R. (1988). A Methodology for the Creation of User Views During Database Design. ACM Transactions on Database Systems (TODS), 13(3), 305-338.
[46] Taguchi, G. (1979). Introduction to Off-line Quality Control. Magaya, Japan: Central Japan Quality Control Association.
[47] Taguchi, G. (1981). On-line Quality Control during Production. Tokyo: Japanese Standards Association.
[48] Teorey, T. J. (1990). Database Modeling and Design: The Entity-Relationship Approach. San Mateo, CA : Morgan Kaufman Publisher.
[49] Teorey, T. J., Yang, D., & Fry, J. P. (1986). A logical design methodology for relational databases using the extended entity-relationship model. ACM Computing Surveys, 18(2), 197-222.
[50] Tsichritzis, D. & Lochovsky, F. (1982). Data Models. Englewood Cliffs, N.J.: Prentice Hall.
[51] Tu, S. & Wang, R. Y. (1993). Modeling Data Quality and Context through Extension of the ER Model. A. Hevner & N. Kamel (Ed.), In Third Annual Workshop on Information Technologies and Systems (WITS-93), (pp. 40-47) Orlando, Florida.
[52] Ullman, J. D. (1982). Principles of Database Systems. Rockville, Maryland, USA: Computer Science Press.
[53] Wang, R. Y. & Kon, H. B. (1993). Towards Total Data Quality Management (TDQM). In Information Technology in Action: Trends and Perspectives. (pp. 179-197). Englewood Cliffs, NJ: Prentice Hall.
[54] Wang, R. Y., Kon, H. B., & Madnick, S. E. (1993). Data Quality Requirements Analysis and Modeling. In the Proceedings of the 9th International Conference on Data Engineering, (pp. 670-677) Vienna: IEEE Computer Society Press.
[55] Wang, R. Y., Reddy, M. P., & Kon, H. B. (1992). Toward Quality Data: An Attribute-based Approach. To appear in the Journal of Decision Support Systems (DSS).
[56] Wang, R. Y., Reddy, M. P., & Kon, H. B. (1994). Toward Quality Data: An Attribute-based Approach. To appear in the Journal of Decision Support Systems (DSS).
[57] Wang, R. Y., Strong, D. M., & Guarascio, L. M. (1993). Data Consumers' Perspectives of Data Quality. (No. TDQM-93-12). MIT Sloan School of Management.
[58] Yu, S. & Neter, J. (1973). A Stochastic Model of the Internal Control System. Journal of Accounting Research, 1(3), 273-295.
This section illustrates the associations between a relationship and the three types of entities (product, product quality, and data quality), even though such associations might occur less often than those between entities. As in the discussion of the cardinalities in Section 3, the analysis below is presented within the context of product requirements and product quality requirements. Specifically, we analyze the following three cases: (1) product entity to product quality relationship, (2) product relationship to product quality entity, and (3) product relationship to product quality relationship.
Figure A1 shows an example of a product entity Rental Car that the user wants to relate to the product quality relationship Trouble Spot inspected-by Car Inspector. To model this, the relationship is converted to an entity gerund Trouble-Spot-Car-Inspector. Then, the relationship Rental Car assigned Trouble-Sport-Car-Inspector is incorporated into the design. The corresponding conceptual design follows Figure A1.

Rental Car: [License#, mileage, date]
Trouble Spot: [TS-Name, inspection-method]
Car Inspector: [Id, name]
Trouble-Spot-Car-Inspector: [TS-Name, Car-Inspector-id]
Trouble Spot inspected by Car Inspector
[1,N] [1,N]
Rental Car assigned Trouble-Spot-Car-Inspector: [time]
[0,N] [0,N]
A quality relationship from a product relationship, Rental Car rented-by Customer to the product quality entity Trouble-Spot, is shown in Figure A2. First, the relationship, Rental Car rented-by Customer is converted to the entity gerund Rental-Car-Customer. Then, the relationship Rental-Car-Customer concerned-with Trouble-Spot can be created. Since this is a many-to-many relationship, it may have relationship attributes; in this case "date-expressed." The conceptual design is given below Figure A2.

Rental Car: [Licence#, mileage, rate]
Customer: [Customer-id]
Trouble Spot: [TS-Name, inspection-method]
Rental-Car-Customer: [Licence#, Customer-id]
Rental Car has Customer
[0,N] [1,N]
Rental-Car-Customer concerned-with Trouble Spot: [date-expressed]
[0,N] [0,N]
Consider the product relationship Rental Car rented-by Customer and the product quality relationship Trouble Spot inspected-by Car Inspector. This will be represented by converting each relationship into an entity gerund and then creating a relationship between the two entity gerunds. This is shown in Figure A3 and the conceptual design given below it.

Rental Car: [Licence#, mileage, rate]
Trouble Spot: [TS-Name, inspection-method]
Customer: [Customer-id]
Car Inspector: [Id, name]
Trouble-Spot-Car-Inspector: [TS-Name, Car-Inspector-Id]
Rental-Car-Customer: [Licence#, Customer-id]
Rental Car rented-by Customer
[0,N] [1,N]
Trouble Spot inspected-by Car Inspector
[0,N] [1,N]
Rental-Car-Customer concerned-with Trouble-Spot-Car-Inspector: [date-expressed]
[0,N] [0,N]