|
ITAG
Architectural Guidelines: Data
- Institute data SHOULD
NOT be released to or stored by a third party without an approved
business reason.
- Business data
created or obtained within the Institute belongs to the
institution, not to any particular function, unit, or
individual.
- Members of the
Institute community MUST ensure that their uses of the business
data of the Institute are consistent with the Institute's policy
on privacy of information.
- It is the
responsibility of the designated custodian of a particular data
collection MUST ensure data integrity, security, and
accessibility to anyone who demonstrates need.
- Critical and sensitive
information MUST be kept on machines that are professionally
managed.
- The Warehouse SHOULD
be used for reporting where possible.
- Enterprise Data SHOULD
be in the Warehouse.
- Data feeds SHOULD come
from the Warehouse where possible.
- Systems of Record MUST
be established for all shared data.
- Social Security number
MUST NOT be used or stored without a clear business need.
- Social Security number
MUST NOT be used as the unique identifier or stored.
Many of the principles
embedded here were first described in 1989 in the Administrative
Computing Principles. The statements that are applicable to data were
numbers 4-9, which are listed below. The full text of these included as
an appendix to this document.
4. DATA OWNERSHIP
Business data created at or obtained within the Institute belong to the
institution, not to any particular function, unit, or individual.
5. DATA PRIVACY
Members of the Institute community are responsible for ensuring that
their uses of the business data of the Institute are consistent with the
Institute's policy on privacy of information.
6. DATA CUSTODIANSHIP
It is the responsibility of the designated custodian of a particular
data collection to ensure data integrity, security, and accessibility to
all who demonstrate need.
7. INFORMATION CONSISTENCY
Administrative applications should facilitate information consistency
across departments and not meet only the needs of the user-sponsor.
8. TIMELINESS
Administrative applications should ensure that business information is
available when needed to affect decisions.
9. ELECTRONIC DATA TRANSFER
Administrative applications should transfer data electronically from the
point of origin and to the ultimate end user.
At the core of these principles is the understanding that the
Institute’s data is a valuable asset, and therefore needs to be well
managed. Good data management practices are based on some common sense
ideas.
- Don’t store data
unless you know why.
- Don’t collect data
that already is being collected at MIT.
- Don’t collect data
until it’s needed.
- Don’t store data
unless there is a plan to maintain it.
- Decide data retention
policies before collecting data.
- Review Data models
before building a system.
- Document the data
definition and sensitivity before collection.
- Only update data in
its’ System of Record.
Before beginning to
collect new data;
If you don’t have a very clear reason to collect and store data don’t,
because it is costly and could be a risk. This sounds obvious, but many
people decide to collect data, because it’s easy at the time and they
think it might be useful later. These are not sufficient reasons by
themselves. This practice tends to lead to collections of data that
aren’t accurate, because there never was a real plan to use and maintain
it. Data is only valuable if it is accurate and can be relied on, and
the best way of assuring this is to make sure it is used. Increasing the
access to and usage of data leads to improved data quality.
One example, that’s seen is
that SSN is collected and stored before it’s needed, because it might be
hard to get later. This practice puts the Institute at risk of
disclosing sensitive information. In addition SSN is collected from all
people in certain processes, even though it will be needed for only a
fraction of this population later. Again this exposes MIT to more risk.
Another common idea is that
since storage is so cheap, then we can keep data forever. In many cases
this is fine, but when storing certain sensitive data this is a mistake.
It exposes MIT to risk. Data retention policies should be determined up
front and data should be purged in accordance with these policies.
Often at MIT data is being
collected in one area of the Institute, but another department has no
knowledge of this. This leads to wasted effort. To avoid this, it’s
recommended to check with Data Administration before collecting new
data. For any piece of data MIT should establish what its’ System of
Record is. This is essentially which is the master copy of the
information. In addition the model should be reviewed to assure that
“keys” for joining the information with other institute data are
included and that the modes is consistent with related information.
If you need help answering these questions contact data-admin@mit.edu
Responsibilities of
a Data Steward;
Share data with all that demonstrate a MIT business need
Systems with valuable data should be stored on professionally managed
machines that are regularly patched for security updates and have
regular offsite backups.
Appendix
From the 1989 document Administrative Computing Principles
PRINCIPLE 4. DATA OWNERSHIP
Business data created at or obtained within the Institute belong to the
institution, not to any particular function, unit, or individual.
RATIONALE:
Data gathered or produced for the business purposes of the Institute
cannot be "owned" by a single individual or functional unit within MIT.
Protection of individual privacy rights and compliance with legal and
fiduciary requirements mandate that the data are owned by the
institution. For members of the Institute community to make informed and
timely decisions, accurate versions of the Institute's business data
that are relevant to their decision must be readily available to them.
IMPLICATIONS:
Institute ownership prevails regardless of the
- Form in which the data
are stored: paper or electronic
- Storage location
- central office
file cabinets or drawers in departmental desks
- large mainframe
disk farms or floppy diskettes
- Transport method:
courier, campus mail, the campus network or a local office's
network.
Members of the Institute
community who can demonstrate a need to use its business data for
Institute business purposes should have access unless Institute policy
prohibits it. Authorized access should be available in a medium and at a
frequency that meets the receivers' business needs. The meaning of data
should be clear to the user and consistent with definitions of related
data. The data should be accurate and up to date.
Where to get what data should be readily discoverable to authorized
members of the community, whether they be newcomers or older hands who
have new information needs.
IMPLEMENTATION:
Information Systems will act as a clearinghouse for compiling a
directory of the collections of the Institute's business data, including
the present access practices associated with them.
Recommendations for how to maintain the directory information and make
it available to authorized members of the community will be prepared for
Steering Committee review.
An appeals process will be
established for cases where access to Institute information for
Institute business
purposes has been denied to a member of the MIT community.
PRINCIPLE 5. DATA PRIVACY
Members of the Institute community are responsible for ensuring that
their uses of the business data of the
Institute are consistent with the Institute's policy on privacy of
information.
RATIONALE:
The Institute commitment to protect the personal privacy of members of
the MIT community is long standing and strong. It is clearly expressed
in policy, in elements of the organization's structure, such as the
Privacy Committee, and in practice.
"Recognizing that specific
items of information about individual students, faculty, and staff (as
well as former students, faculty and staff) must be maintained for the
educational, research, and other institutional purposes of MIT, it is
MIT policy that such information be collected, maintained, and used by
the Institute only for appropriate, necessary, and clearly defined
purposes, and that such information be controlled and safeguarded in
order to insure the protection of personal privacy."
"...Such information should not be used or exchanged within the
Institute for purposes other than those stated or related, legitimate
purposes that would be reasonably expected."2
IMPLICATIONS:
The Institute's policy on privacy of information applies for all the
Institute's business data, regardless of the storage form or location.
With the rapid proliferation of information technology in Institute
offices and the interest in new applications, such as computerized phone
directories, protection of individual privacy will occur only if all
members of the community know about the policy and are diligent about
complying with it.
Recipients of confidential data in files downloaded to workstations must
maintain the confidentiality of those data. People who develop new
applications in central offices or other departments must assure that
their use of information conforms with the privacy policy. They will,
for example, have to notify students and staff when information about
them, collected for other purposes, will be put to a new use and give
people the chance to refuse to have data about themselves included.
IMPLEMENTATION:
Contents of the various policy and procedure communications on this
subject will be coordinated to assure that all members of the community
understand the Institute's Policy on Privacy of Information and their
responsibilities for compliance with it, for both computerized and
manual records in all offices.
PRINCIPLE 6. DATA CUSTODIANSHIP
It is the responsibility of the designated custodian of a particular
data collection to ensure data integrity,
security and accessibility to all who demonstrate need.
RATIONALE:
Among all the business data stored in many locations at the Institute,
certain collections of business data have been designated the official
records of the Institute, and certain officers of the Institute have
been designated the custodians of those official records.
The practice of custody has
over the years been extended to include implicit custodianship of the
computerized data from which official records are now usually printed.
Although it is appropriate that ownership of the data be assigned to the
Institute and accessibility be available to all who demonstrate need, it
is also necessary to designate an identifiable focal point for assuring
the protection of the
records' accuracy, integrity and security.
IMPLICATIONS:
The roles and responsibilities of custodians for computerized
information must be clear and broadly
understood. The responsibilities are complex, balancing the sometimes
competing demands of daily operations, accessibility, privacy, legal
constraints, and accuracy. Efforts to resolve complexities can lead
individual custodians to differing interpretations of their roles. The
differences may have frustrating results, especially for those who need
to combine information from more than one data collection, as many
departmental administrators do.
Identification of custodians must be explicit, but is increasingly
complicated as offices share information and computerized databases.
Resolving custody identification issues may impact current
organizational forms.
IMPLEMENTATION:
A custodian will be identified for each collection of administrative
data. For those data collections that cross organizational boundaries,
the custodianship of individual data elements will be established.
Responsibilities of custodians for the data's logical and physical
integrity and for responding to requests for access to the data will be
clarified and communicated to all concerned.
Custodians will provide information about the data collections to
Information Systems for the directory of the collections of the
Institute's business information.
PRINCIPLE 7. INFORMATION CONSISTENCY
Administrative applications should facilitate information consistency
across departments and not meet only the needs of the user-sponsor.
RATIONALE:
As requirements for business information expand beyond the single
organizational unit that is the custodian of the data, it is critical
that applications support data integration. Values and definitions of
data that span organizations or systems must be maintained consistently
in order to ensure accuracy for reporting and decision making, even
though the data may not all be stored in a single location.
Attempts to relate data from several systems with differing data
definitions and update procedures are expensive and can be impossible to
achieve or sustain. When data values and definitions are consistent
across applications, however, those who query or change data are assured
of accuracy from a single point of contact, without having to verify
contents of related systems. A home address change, for example, can
reach all affected systems electronically. Inquiries to any of the
systems then receive correct replies.
IMPLICATIONS:
User-sponsors of new systems must examine possibilities for data
integration with other central administrative units, as well as the
information needs of the user-sponsor's clients. Failure to address
these issues may lead to the rejection of a proposal for a new system or
for major changes to an
existing system. When assessing alternatives for a new student system,
for example, the Office of Registration and Student Financial Services
should address the needs of Faculty, students and department
administrators who interact with it regularly, as well as those of other
central departments, such as the Admissions Office and the Dean for
Student Affairs.
IMPLEMENTATION:
Information Services and Technology (IS&T) will act as a clearinghouse for administrative data
definitions. IS&T will work with data custodians to create definitions of
data elements. IS&T will publish the definitions and work with the
user-sponsors of systems to maintain the currency of the definitions.
When reviewing and
evaluating proposals for major information systems expenditures, senior
officers will examine whether the proposal facilitates data integration
with other central administrative systems and whether it fulfills the
information needs of the system's clients.
PRINCIPLE 8. TIMELINESS
Administrative applications should ensure that business information is
available when needed to affect decisions.
RATIONALE:
Business information must be made readily available to both the central
administrative departments and the creators and ultimate consumers of
the data in the academic departments, laboratories, and research
centers. For example, information that is needed to affect decisions on
a daily basis must be updated and made available daily. If central
administrative data are not maintained and made available as needed,
other units that need the information more frequently must undertake
parallel efforts to capture, maintain, and report the data.
IMPLICATIONS:
Many of our current central administrative systems are batch systems
whose update cycles are weekly or monthly. Much information needed by
department administrators should be updated daily in order for them to
support the functioning of their lab, department, or center. Central
batch systems must migrate toward real-time systems or at least toward
batch systems with a daily update cycle.
IMPLEMENTATION:
In creating new applications, the user-sponsor will analyze the
information frequency requirements not only of its own organizational
unit, but also of the creators and recipients of the data.
This analysis will be taken into account by the Steering Committee in
reviewing proposals for new systems or major enhancements to existing
systems. Preference will be given to those proposals that meet the
information timeliness needs of all creators and consumers of the data
in the system.
PRINCIPLE 9. ELECTRONIC DATA TRANSFER
Administrative applications should transfer data electronically from the
point of origin and to the ultimate end user.
RATIONALE:
Costs of the technology for electronic data transfer are going down,
while costs of people are increasing. The time people spend transcribing
data (copying from one form, often a computer-generated report, to
another, perhaps a workstation) could be better spent on other tasks.
The possibility for error increases each time data must be copied. The
further away from the point of origin, who might be a requisitioner
wielding a pencil, that data are captured electronically, the more
likely are data errors. Time spent reconciling discrepancies could be
used more productively.
IMPLICATIONS:
The organization and practices of MIT offices, central and otherwise,
support the present paper flow. Extending the boundaries over which data
are transferred electronically changes how all offices operate. In
addition, the variety of practices in offices across MIT probably must
conform to a more standard set, so development and support of
applications for data collection and delivery can be focused.
The present portfolio of administrative applications must change to
support extensive transfer of data among workstations on demand, beyond
simply uploading and downloading files. The necessary changes are more
than cosmetic. Today's entry screens, validation routines, data
structures, and processing cycles support the central offices, which are
usually neither the point of origin for such raw materials as time
sheets and requisitions, nor the ultimate end user for such products as
SANDI's and monthly statements.
The Institute's information technology environment should include
easy-to-use facilities for finding out what data are available, for
assuring the validity of someone's electronic signature, and for
managing the process of routing electronic actions from departments
along appropriate approval paths.
IMPLEMENTATION:
Steering Committee consideration of proposals will favor those that use
electronic data transfer to accommodate departmental and senior
management information needs.
Business system projects will include analysis of the organizational
impacts of electronic data transfer.
Information Services and Technology (IS&T) will undertake infrastructure projects to support
extended electronic data transfer.
|