Report of the Indiana University Research Data Management Taskforce

The “data deluge” in the sciences—the ability to create massive streams of digital data—has been discussed at great length in the academic and lay press. The ability with which scientists can now produce data has transformed scientific practice so that creating data is now less of a challenge in many disciplines than making use of, properly analyzing, and properly storing such data. Two aspects of the data deluge are not as widely appreciated. One is that the data deluge is not contained simply to the sciences. Humanities scholars and artists are generating data at prodigious rates as well through massive scanning projects, digitization of still photographs, video, and music, and the creation of new musical and visual art forms that are inherently digital. A second factor that is not well appreciated is that data collected now is potentially valuable forever. The genomic DNA sequences of a particular organism are what they are. They are known precisely. Or, more properly, the sequences of the contigs that are assembled to create the sequence are known precisely, while there may be dispute about the proper assembly. Such data will be of value indefinitely – and for example to the extent that we wonder if environmental changes are changing the population genetics of various organisms, data on the frequency of particular genetic variations in populations will be of value indefinitely. Similarly, video and audio of an American folk musician, a speaker of an endangered language or a ballet performance will be of value indefinitely although argument might well go on regarding the interpretation and annotation of that video and audio. Such images and associated audio can never be recreated, and are thus of use indefinitely.
