Quick links

DIG news blog (Here Be Data)

Data wiki (coming soon)

Open Data (SPARC)

Data Initiatives Group - MIT Engineering and Science Libraries (ESL)

Explorations by library, information, and data professionals on the challenges of managing, sharing, and communicating scientific data

Data explorations: faculty interview questions and findings

General questions:

  • What kind of data do you collect in your research?
  • Where do you store your data?
  • Where do you store the data you share collectively among researchers?
  • How do you organize your datasets? (for example, via a database)
  • Do you apply any metadata to your data as part of organizing it?
  • How do you archive/preserve your data?
  • Are you interested in archiving/preserving your data?
  • Can you think of reasons why you might want to/not want to archive/preserve your data?
  • Have you considered using DSpace to archive/preserve your data?
  • Would you consider consulting a librarian for help organizing/archiving/preserving your data? (Librarians can help with: metadata, standards, intellectual property, archiving/preservation issues)

Supplementary questions:

Geosciences and civil engineering:

    * Please describe the data you generate.
    * Is there a specific software package/algorithm associated with your data generation?
    * Is there a GIS component to your modeling software?
    * If there is not a GIS component to your modeling software, do you use the MIT IS&T licensed GIS Software? (Arc GIS or PCI Geomatica). Do you use other GIS software? If so, what is it called, and why do you use it?
    * Do you collect data generated by other people?
    * Do you use data from a genome data bank or other biology related source?
    * Do you use data from the MIT geodata repository?
    * Where do you store data?
    * Would you like to attach your data to papers you publish?

Computer science:

  • Have you ever experienced a loss of data when a graduate student left and you could no longer access their data?
  • Is your profession actively looking for solutions to data storage/reuse issues?

Life sciences:

  • What is the biggest need for data in your institute?(i.e. microscopy data was 1 suggestion I received)
  • How can the libraries help with this? (storage, good accessibility to stored data, central repository for images on a gargantuan scale as images)
  • Do departments want to share data in repositories? Please comment
  • Some libraries around the US are beginning to hire bioinformatio specialists. Do you feel that is needed at MIT?
  • What is the real value of the library to you in these areas? (one example is Biobase, which you mentioned previously)
  • How can the libraries help with data instruction? (To date, we have collaborated with Cancer Center and Broad for instruction; we provide classroom and publicize; scientists teach, such as training for GenePattern, genome sequencing analysis software )
  • What are your informatics needs?
  • What products are you using (e.g. NCBI Entrez databases)?
  • What type of products should be available campus-wide?
  • Who does the training in your department?
  • What about training in graduate and undergraduate courses?
  • Does your research group maintain your own informatics database?

General findings:

  • Data is difficult to consistently store and organize.
  • Metadata is rarely assigned, and done in a haphazard way.
  • Some data is not shareable, due to privacy considerations.
  • Loss of control over data is an issue.
  • Time and effort are needed to manage data effectively.
  • There are many technical barriers to data storage and organization.
  • Researchers would like libraries to provide access to commercial databases.
  • Researchers need to get to earlier versions of data sets.
  • Researchers are curious about the flexibility of DSpace.
  • There is a perceived need for centralized storage of data.
  • Researchers need to evaluate what is worth archiving.
  • Researchers have faith in the longevity of current formats.