Quick links

DIG news blog (Here Be Data)

Data wiki (coming soon)

Open Data (SPARC)

Data Initiatives Group - MIT Engineering and Science Libraries (ESL)

Explorations by library, information, and data professionals on the challenges of managing, sharing, and communicating scientific data

Bibliography and readings:

2007

Dealing with data (UKOLN), 2007

Karen S. Baker and Florence Millerand, Scientific Infrastructure Design: Information Environments and Knowledge Provinces. Accepted Apr 2007. To appear Oct 2007 in Proceedings of the American Society of Information Science and Technology (ASIS&T)

June 2007 issue of First Monday (volume 12, number 6), includes selected papers from the conference
"Cyberinfrastructure for Collaboration and Innovation" held in Washington, D.C. 29-30 January 2007. The special editors for this issue are Brian Kahin and Steven J. Jackson.

May 15-18, 2007, 33rd annual conference of the International Association for Social Science
Information Service and Technology (IASSIST) was held. The theme was Building Global Knowledge Communities with Open Data

STS/ACRL posters at ALA annual, 2007 (Institutional repositories and non-textual data)

Background papers from April 2007 NSF-JISC conference on repositories (2007)

History and theory of infrastructure: Lessons for New Scientific Cyberinfrastructures (NSF workshop, Sept-Oct 2006, and report January 2007).

2006

Commons of Science Conference - Creating a Vision of Making Scientific Data Accessible Across Disciplines. Oct 3-4, 2006 Washington D.C. USA; papers

Towards 2020 Science, by the 2020 Science Group (Microsoft Research), March 21, 2006. Emphasizes the growing role of information technology, e-science and related developments. See: Overview, Report: Science in 2020, and related articles in Nature.

* "computer science and the sciences that has the potential to have a profound impact on science. It is a leap from the application of computing to support scientists to ‘do’ science (i.e. ‘computational science’) to the integration of computer science concepts, tools and theorems into the very fabric of science." (p. 8)
* Conceptual and technological tools developed within computer science are,for the first time, starting to have wide-ranging applications outside the subjectin which they originated, especially in sciences investigating complex systems,most notably in biology and chemistry. Indeed,we believe computer science is poised to become as fundamental to biology as mathematics has become tophysics. " (p 8) - scientists need to internalize computer science; not just use the tools. This will effect how scientists are (or should be ) taught.
* "computer science concepts and tools in science form a third,and vital component of enabling a ‘golden triangle’ to be formed with novelmathematical and statistical techniques in science, and scientific computingplatforms and applications integrated into experimental and theoreticalscience. " (p 8)
* challenge of end-to-end scientific data management, from data acquisition and data integration, to data treatment, provenance and persistence." ... vitally more important, and dramatic in its impact, will be the integration of new conceptual and technological tools from computer science into the sciences." "...based on the use of a generic computational environment, in the same way that they learn universally applicable mathematical skills." Report views that ‘computational science’ as a separate ‘third pillar’ in science alongside experimental and theoretical science is an intermediate, unsustainable and undesirable state.
* Significant implications for scientific publishing,: paradigm shift: "The traditional sequence of ‘experiment › analysis › publication’ => ‘experiment › data organisation › analysis › publication’ (p. 16)
* profound developments in the future of computing.
* significant implications for the education of tomorrow’s scientists and science policy and funding. Scientists will need to be completely computationally and mathematically literate, and by 2020

There's gold in those archives HHMI Bulletin, 2006

To stand the test of time Sept. 2006 ARL/NSF workshop on E-Science (full report); related resources from ARL

Minding the planet: The meaning and future of the Semantic Web (Nov. 2006)

Special issue on data standards in biological research. OMICS: A Journal of Integrative Biology vol 10(2), 2006.

Data publication in the open access initiative. Jens Klump, et al., Data Science Journal, Vol 5, 15 June 2006, 79-83.

Dealing with data deluge: Chemical informatics professionals help scientists cope with and benefit from information overload
July 17, 2006 Volume 84, Number 29 pp. 93-94, 96

NSF's Cyberinfrastructure Vision for 21st Century Discovery (NSF, January 2006) Version 5.0 of this document outlines strategic goals for developing "comprehensive infrastructure needed to capitalize on dramatic advances in information technology." A lot of emphasis on high performance computing, data analysis and visualization, collaboratories, also on workforce development.

Don't Leave the Data in the Dark: Issues in Digitizing Print Statistical Publications, Linden and Green, D-Lib Magazine, January 2006, Volume 12 Number 1

"Digitization has the potential to transform scholarly use of data found in print statistical publications. While presenting images of statistical tables in a digital library environment may be desirable, the full potential of such material can be realized only if the resulting digital objects are easy to search and manipulate and are accompanied by sufficient metadata to support extraction of numbers from tables and comparison of numbers across tables. "

Bringing chemical data onto the Semantic Web. Taylor, K. R.; Gledhill, R. J.; Essex, J. W.; Frey, J. G.; Harris, S. W.; De Roure, D. C. School of Chemistry, University of Southampton, UK. Journal of Chemical Information and Modeling
Web Release Date: 26-Jan-2006; (Article) DOI: 10.1021/ci050378m

* good description of semantic web and how RDF technology (the one used to create RSS feeds) could be used to with chemical information.
* Quoting the last paragraph - "The Semantic Web is an ambitious goal, and having chemical information there even more so, requiring contributions from multiple players in order to achieve maximum benefit. In chemistry, only a comparatively small population is interested in any particular area. One can conceive of free data exchange and banishment of the proprietary file format, but there are parties who do not want to make data more easily available to their competitors. However, this work demonstrates the value of adopting this approach on the scale of this project..."

Jane Hunter, Scientific models - A user-oriented approach to the integration of scientific data and digital ibraries

Many scientific communities are struggling with the challenge of how to manage the terabytes of data they are producing, often on a daily basis. Scientific models are the primary method for representing and encapsulating expert knowledge in many disciplines. Scientific models could also provide a mechanism: for publishing and sharing scientific results; for teaching complex scientific concepts; and for the selective archival, curation and preservation of scientific data. As such, they also provide a bridge for collaboration between Digital Libraries and eScience. In this paper I describe research being undertaken within the FUSION project at the University of Queensland to enable scientists to construct, publish and manage scientific model packages that encapsulate and relate the raw data to its associated contextual and provenance metadata, processing steps, derived information and publications. This work involves extending tools and services that have come out of the Digital Libraries domain to support e-Science requirements

2005

Long-lived data collections enabling research and education in the 21st century - National Science Board NSB05-40

E-Research: An imperative for strengthening institutional partnerships, Linda O’Brien. Educause Review November/December 2005, Volume 40, Number 6.

A transformation is clearly occurring in research practice, a transformation that will have a profound impact on the roles of information professionals within higher education.

2004

Will the Semantic Web change science? Tim Finin and Joel Sachs, Sept. 2004.

Digital preservation and permanent access to scientific information: The state of the practice CENDI - 2004-3

Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure (aka the Atkins report, 2004) Essential reading.

2003

Clifford Lynch, “Institutional repositories: Essential infrastructure for scholarship in the digital age”, ARL Bimonthly (February 2003)

NIH data sharing policy: 2003. Includes goals, definitions and examples of data sharing, including data archiving and "enclaves". NIH grantees with $500K plus funding are explected to include a data sharing plan.