ITAG - Information Technology Architecture Group

Guidelines on Data

ITAG Architectural Guidelines: Data

  • Institute data SHOULD NOT be released to or stored by a third party without an approved business reason.
    • Business data created or obtained within the Institute belongs to the institution, not to any particular function, unit, or individual.
    • Members of the Institute community MUST ensure that their uses of the business data of the Institute are consistent with the Institute's policy on privacy of information.
    • It is the responsibility of the designated custodian of a particular data collection MUST ensure data integrity, security, and accessibility to anyone who demonstrates need.
  • Critical and sensitive information MUST be kept on machines that are professionally managed.

  • The Warehouse SHOULD be used for reporting where possible.

  • Enterprise Data SHOULD be in the Warehouse.

  • Data feeds SHOULD come from the Warehouse where possible.

  • Systems of Record MUST be established for all shared data.

  • Social Security number MUST NOT be used or stored without a clear business need.

  • Social Security number MUST NOT be used as the unique identifier or stored.

Many of the principles embedded here were first described in 1989 in the Administrative Computing Principles. The statements that are applicable to data were numbers 4-9, which are listed below. The full text of these included as an appendix to this document.

4. DATA OWNERSHIP
Business data created at or obtained within the Institute belong to the institution, not to any particular function, unit, or individual.

5. DATA PRIVACY
Members of the Institute community are responsible for ensuring that their uses of the business data of the Institute are consistent with the Institute's policy on privacy of information.

6. DATA CUSTODIANSHIP
It is the responsibility of the designated custodian of a particular data collection to ensure data integrity, security, and accessibility to all who demonstrate need.

7. INFORMATION CONSISTENCY
Administrative applications should facilitate information consistency across departments and not meet only the needs of the user-sponsor.

8. TIMELINESS
Administrative applications should ensure that business information is available when needed to affect decisions.

9. ELECTRONIC DATA TRANSFER
Administrative applications should transfer data electronically from the point of origin and to the ultimate end user.

At the core of these principles is the understanding that the Institute’s data is a valuable asset, and therefore needs to be well managed. Good data management practices are based on some common sense ideas.

  • Don’t store data unless you know why.
  • Don’t collect data that already is being collected at MIT.
  • Don’t collect data until it’s needed.
  • Don’t store data unless there is a plan to maintain it.
  • Decide data retention policies before collecting data.
  • Review Data models before building a system.
  • Document the data definition and sensitivity before collection.
  • Only update data in its’ System of Record.

Before beginning to collect new data;
If you don’t have a very clear reason to collect and store data don’t, because it is costly and could be a risk. This sounds obvious, but many people decide to collect data, because it’s easy at the time and they think it might be useful later. These are not sufficient reasons by themselves. This practice tends to lead to collections of data that aren’t accurate, because there never was a real plan to use and maintain it. Data is only valuable if it is accurate and can be relied on, and the best way of assuring this is to make sure it is used. Increasing the access to and usage of data leads to improved data quality.

One example, that’s seen is that SSN is collected and stored before it’s needed, because it might be hard to get later. This practice puts the Institute at risk of disclosing sensitive information. In addition SSN is collected from all people in certain processes, even though it will be needed for only a fraction of this population later. Again this exposes MIT to more risk.

Another common idea is that since storage is so cheap, then we can keep data forever. In many cases this is fine, but when storing certain sensitive data this is a mistake. It exposes MIT to risk. Data retention policies should be determined up front and data should be purged in accordance with these policies.

Often at MIT data is being collected in one area of the Institute, but another department has no knowledge of this. This leads to wasted effort. To avoid this, it’s recommended to check with Data Administration before collecting new data. For any piece of data MIT should establish what its’ System of Record is. This is essentially which is the master copy of the information. In addition the model should be reviewed to assure that “keys” for joining the information with other institute data are included and that the modes is consistent with related information.

If you need help answering these questions contact data-admin@mit.edu

Responsibilities of a Data Steward;

Share data with all that demonstrate a MIT business need

Systems with valuable data should be stored on professionally managed machines that are regularly patched for security updates and have regular offsite backups.

Appendix
From the 1989 document Administrative Computing Principles


PRINCIPLE 4. DATA OWNERSHIP

Business data created at or obtained within the Institute belong to the institution, not to any particular function, unit, or individual.

RATIONALE:
Data gathered or produced for the business purposes of the Institute cannot be "owned" by a single individual or functional unit within MIT. Protection of individual privacy rights and compliance with legal and fiduciary requirements mandate that the data are owned by the institution. For members of the Institute community to make informed and timely decisions, accurate versions of the Institute's business data that are relevant to their decision must be readily available to them.

IMPLICATIONS:
Institute ownership prevails regardless of the

  • Form in which the data are stored: paper or electronic
  • Storage location
    • central office file cabinets or drawers in departmental desks
    • large mainframe disk farms or floppy diskettes
  • Transport method: courier, campus mail, the campus network or a local office's network.

Members of the Institute community who can demonstrate a need to use its business data for Institute business purposes should have access unless Institute policy prohibits it. Authorized access should be available in a medium and at a frequency that meets the receivers' business needs. The meaning of data should be clear to the user and consistent with definitions of related data. The data should be accurate and up to date.

Where to get what data should be readily discoverable to authorized members of the community, whether they be newcomers or older hands who have new information needs.

IMPLEMENTATION:
Information Systems will act as a clearinghouse for compiling a directory of the collections of the Institute's business data, including the present access practices associated with them.

Recommendations for how to maintain the directory information and make it available to authorized members of the community will be prepared for Steering Committee review.

An appeals process will be established for cases where access to Institute information for Institute business
purposes has been denied to a member of the MIT community.


PRINCIPLE 5. DATA PRIVACY

Members of the Institute community are responsible for ensuring that their uses of the business data of the Institute are consistent with the Institute's policy on privacy of information.

RATIONALE:
The Institute commitment to protect the personal privacy of members of the MIT community is long standing and strong. It is clearly expressed in policy, in elements of the organization's structure, such as the Privacy Committee, and in practice.

"Recognizing that specific items of information about individual students, faculty, and staff (as well as former students, faculty and staff) must be maintained for the educational, research, and other institutional purposes of MIT, it is MIT policy that such information be collected, maintained, and used by the Institute only for appropriate, necessary, and clearly defined purposes, and that such information be controlled and safeguarded in order to insure the protection of personal privacy."

"...Such information should not be used or exchanged within the Institute for purposes other than those stated or related, legitimate purposes that would be reasonably expected."2

IMPLICATIONS:
The Institute's policy on privacy of information applies for all the Institute's business data, regardless of the storage form or location. With the rapid proliferation of information technology in Institute offices and the interest in new applications, such as computerized phone directories, protection of individual privacy will occur only if all members of the community know about the policy and are diligent about complying with it.

Recipients of confidential data in files downloaded to workstations must maintain the confidentiality of those data. People who develop new applications in central offices or other departments must assure that their use of information conforms with the privacy policy. They will, for example, have to notify students and staff when information about them, collected for other purposes, will be put to a new use and give people the chance to refuse to have data about themselves included.

IMPLEMENTATION:
Contents of the various policy and procedure communications on this subject will be coordinated to assure that all members of the community understand the Institute's Policy on Privacy of Information and their responsibilities for compliance with it, for both computerized and manual records in all offices.


PRINCIPLE 6. DATA CUSTODIANSHIP

It is the responsibility of the designated custodian of a particular data collection to ensure data integrity, security and accessibility to all who demonstrate need.

RATIONALE:
Among all the business data stored in many locations at the Institute, certain collections of business data have been designated the official records of the Institute, and certain officers of the Institute have been designated the custodians of those official records.

The practice of custody has over the years been extended to include implicit custodianship of the computerized data from which official records are now usually printed. Although it is appropriate that ownership of the data be assigned to the Institute and accessibility be available to all who demonstrate need, it is also necessary to designate an identifiable focal point for assuring the protection of the records' accuracy, integrity and security.

IMPLICATIONS:
The roles and responsibilities of custodians for computerized information must be clear and broadly understood. The responsibilities are complex, balancing the sometimes competing demands of daily operations, accessibility, privacy, legal constraints, and accuracy. Efforts to resolve complexities can lead individual custodians to differing interpretations of their roles. The differences may have frustrating results, especially for those who need to combine information from more than one data collection, as many departmental administrators do.

Identification of custodians must be explicit, but is increasingly complicated as offices share information and computerized databases. Resolving custody identification issues may impact current organizational forms.

IMPLEMENTATION:
A custodian will be identified for each collection of administrative data. For those data collections that cross organizational boundaries, the custodianship of individual data elements will be established.
Responsibilities of custodians for the data's logical and physical integrity and for responding to requests for access to the data will be clarified and communicated to all concerned.

Custodians will provide information about the data collections to Information Systems for the directory of the collections of the Institute's business information.


PRINCIPLE 7. INFORMATION CONSISTENCY

Administrative applications should facilitate information consistency across departments and not meet only the needs of the user-sponsor.

RATIONALE:
As requirements for business information expand beyond the single organizational unit that is the custodian of the data, it is critical that applications support data integration. Values and definitions of data that span organizations or systems must be maintained consistently in order to ensure accuracy for reporting and decision making, even though the data may not all be stored in a single location.

Attempts to relate data from several systems with differing data definitions and update procedures are expensive and can be impossible to achieve or sustain. When data values and definitions are consistent across applications, however, those who query or change data are assured of accuracy from a single point of contact, without having to verify contents of related systems. A home address change, for example, can
reach all affected systems electronically. Inquiries to any of the systems then receive correct replies.

IMPLICATIONS:
User-sponsors of new systems must examine possibilities for data integration with other central administrative units, as well as the information needs of the user-sponsor's clients. Failure to address these issues may lead to the rejection of a proposal for a new system or for major changes to an
existing system. When assessing alternatives for a new student system, for example, the Office of Registration and Student Financial Services should address the needs of Faculty, students and department administrators who interact with it regularly, as well as those of other central departments, such as the Admissions Office and the Dean for Student Affairs.

IMPLEMENTATION:
Information Services and Technology (IS&T)  will act as a clearinghouse for administrative data definitions. IS&T will work with data custodians to create definitions of data elements. IS&T will publish the definitions and work with the user-sponsors of systems to maintain the currency of the definitions.

When reviewing and evaluating proposals for major information systems expenditures, senior officers will examine whether the proposal facilitates data integration with other central administrative systems and whether it fulfills the information needs of the system's clients.


PRINCIPLE 8. TIMELINESS

Administrative applications should ensure that business information is available when needed to affect decisions.

RATIONALE:
Business information must be made readily available to both the central administrative departments and the creators and ultimate consumers of the data in the academic departments, laboratories, and research centers. For example, information that is needed to affect decisions on a daily basis must be updated and made available daily. If central administrative data are not maintained and made available as needed, other units that need the information more frequently must undertake parallel efforts to capture, maintain, and report the data.

IMPLICATIONS:
Many of our current central administrative systems are batch systems whose update cycles are weekly or monthly. Much information needed by department administrators should be updated daily in order for them to support the functioning of their lab, department, or center. Central batch systems must migrate toward real-time systems or at least toward batch systems with a daily update cycle.

IMPLEMENTATION:
In creating new applications, the user-sponsor will analyze the information frequency requirements not only of its own organizational unit, but also of the creators and recipients of the data.

This analysis will be taken into account by the Steering Committee in reviewing proposals for new systems or major enhancements to existing systems. Preference will be given to those proposals that meet the information timeliness needs of all creators and consumers of the data in the system.


PRINCIPLE 9. ELECTRONIC DATA TRANSFER

Administrative applications should transfer data electronically from the point of origin and to the ultimate end user.

RATIONALE:
Costs of the technology for electronic data transfer are going down, while costs of people are increasing. The time people spend transcribing data (copying from one form, often a computer-generated report, to another, perhaps a workstation) could be better spent on other tasks. The possibility for error increases each time data must be copied. The further away from the point of origin, who might be a requisitioner wielding a pencil, that data are captured electronically, the more likely are data errors. Time spent reconciling discrepancies could be used more productively.

IMPLICATIONS:
The organization and practices of MIT offices, central and otherwise, support the present paper flow. Extending the boundaries over which data are transferred electronically changes how all offices operate. In addition, the variety of practices in offices across MIT probably must conform to a more standard set, so development and support of applications for data collection and delivery can be focused.

The present portfolio of administrative applications must change to support extensive transfer of data among workstations on demand, beyond simply uploading and downloading files. The necessary changes are more than cosmetic. Today's entry screens, validation routines, data structures, and processing cycles support the central offices, which are usually neither the point of origin for such raw materials as time sheets and requisitions, nor the ultimate end user for such products as SANDI's and monthly statements.

The Institute's information technology environment should include easy-to-use facilities for finding out what data are available, for assuring the validity of someone's electronic signature, and for managing the process of routing electronic actions from departments along appropriate approval paths.

IMPLEMENTATION:
Steering Committee consideration of proposals will favor those that use electronic data transfer to accommodate departmental and senior management information needs.

Business system projects will include analysis of the organizational impacts of electronic data transfer.

Information Services and Technology (IS&T) will undertake infrastructure projects to support extended electronic data transfer.