Some of the materials located and collected at the National Library of Medicine through the past six months of the National Digital Stewardship Residency.
As the National Digital Stewardship Resident (NDSR) at the National Library of Medicine (NLM), I am currently devising a software preservation pilot strategy. What this strategy entails is the repository ingest of software materials held on obsolete media, the description of said materials, and the creation or digitization of contextual materials. Complicating this project is the fact that there has not been a comprehensive collection strategy for software at NLM and many documents and copies of software have been lost over time. With this in mind, the first, and perhaps most important step, is for institutions to include software and software documentation as a part of their pre-existing collection strategies.
Software preservation is, quite simply, the attempt to make software usable many years in the future. Although it is possible to save the bytes of a piece of software, providing access to it and making it usable is more difficult. Because software relies on complex technical infrastructures in order to operate properly, future users may not be able to interact with software in a meaningful way if an institution only saves the bytes. For a software program to function, it needs to be installed on the correct operating system, on the correct hardware, and with any necessary ancillary programs or code libraries also installed. For a preservationist, this can be a nightmare.
One of the NLM staff members interacting with legacy hardware and software.
Emulation is one way to deal with these complex dependencies, and recent attempts to make emulation an easier option for libraries and archives have made immense gains. There are a variety of services – EAAS and The Olive Archive – that help people to emulate computing environments and assists libraries and museums with the vast technical dependencies of software programs. While this technology is a big step forward for the field and for access to software-based materials, it does not constitute a complete response to software preservation. Before getting to the point of emulating materials, an institution needs to address how it will collect software.
The process of devising and implementing a collection strategy for software and related materials can be daunting even for institutions whose collections are already closely aligned with software, computing, and technology history. Regardless, a comprehensive collection strategy is the first and most important step to preserving software. Without a collection strategy, it is more likely than not that software will be lost before an institution has managed to devise and implement larger strategies for future users to access and use that software, either through emulation or another tool.
A proper collection strategy for software based materials should reflect the larger collection goals of the institution. It does not make sense for an art library to begin collecting scientific software, but it does make sense for an art library to collect software that artists created or used as a part of their creative process. Just as adding A/V materials to a collection strategy does not mean that an institution needs to collect all A/V related materials, collecting software does not mean collecting all software. The same care and attention paid to the wider collections strategy needs to be taken when considering software acquisitions.
Part of this collection strategy should include contemporaneous documentation and manuals. Throughout my project at NDSR, I’ve relied on manuals and documentation in order to get software running and to understand what the software is meant to do. Without this documentation, having copies of the software would prove of limited value. Furthermore, the marketing and packaging material for software when it is held on tangible media can have historical importance itself. Any collection strategy for born digital materials should consider what analog materials also need to be collected so that future historians, archivists, and researchers can properly contextualize and assign meaning to a piece of software.
Screenshot of “How-To” Grateful Med, one of the NLM-developed software programs.
It is important that the collection strategy is well-documented and added to the wider collection strategy documents at the institution. Communicating the importance of collecting software programs or software-based materials is an essential part of beginning a software preservation project. An institution relies on many employees in order to acquire, care for, and provide access to its materials. Creating accessible documentation about software collection and engaging in an open dialogue about collecting software is an important aspect of creating a sustainable program.
After software is added to an institution’s collection strategy as is applicable, there are many other questions and issues to attend to. Moving forward, it will be vital for institution’s to get software onto a server and off of volatile tangible media like floppy disks and CD-ROMS. If the first step is to collect software, the second step is to save the bytes. However, simply adding software to the collections strategy will ensure that the institution, in the future, will have materials to preserve and showcase in whatever manner they decide best suits their resources, audiences, and needs. Without a comprehensive collection strategy, future actions, projects, and programs will be severely limited.