Starting is Half the Battle: Collecting as the First Step to Software Preservation

Quick note: This blog post will also be posted on The Future of Information Alliance (FIA) website.

Image 3
Example of the types of software developed at the National Library of Medicine

As the National Digital Stewardship Resident (NDSR) at the National Library of Medicine, I am currently devising a software preservation pilot strategy. What this strategy entails is the repository ingest of software materials held on obsolete media, the description of said materials, and the creation or digitization of contextual materials. Complicating this project is the fact that there has not been a comprehensive collection strategy for software at NLM, and many documents and copies of software have been lost over time. With this in mind, the first, and perhaps most important step, is for institutions to include software and software documentation as a part of their pre-existing collection strategies.

Software preservation is, quite simply, the attempt to make software usable many years in the future. Although it is possible to save the bytes of a piece of software, providing access to it and making it usable is more difficult. Because software relies on complex technical infrastructures in order to operate properly, future users may not be able to interact with software in a meaningful way if an institution only saves the bytes. For a software program to function, it needs to be installed on the correct operating system, on the correct hardware, and with any necessary ancillary programs or code libraries also installed. For a preservationist, this can be a nightmare.

Emulation is one way to deal with these complex dependencies, and recent attempts to make emulation an easier option for libraries and archives have made immense gaines. There are a variety of services – EaaS and The Olive Archive – that help people emulate computing environments and assist libraries and museums with the vast technical dependencies of software programs. While this technology is a big step forward for the field and for access to software-based materials, it does not constitute a complete response to the need for software preservation. Before getting to the point of emulating materials, an institution needs to address how it will collect software.

The process of devising and implementing a collection strategy for software and related materials can be daunting even for institutions whose collections are already closely aligned with software, computing, and technology history. Regardless, a comprehensive collection strategy is the first and most important step to preserving software. Without a collection strategy, it is more likely than not that the software will be lost before an institution has managed to device and implement larger strategies for futures users to access that software, either through emulation or another tool.

A proper collection strategy for software based materials should reflect the larger collection goals of the institution. It does not make sense for an art library to begin collecting scientific software, but it does make sense for an art library to collect software that artists created or used as part of their creative process. Just as adding A/V materials to a collection strategy does not mean that an institution needs to collect all A/V materials, collecting software does not mean collecting all software. The same care and attention paid to the wider collection strategy needs to be taken when considering software acquisitions.

Presenting about software preservation at the National Library of Medicine; Photo by Ben Petersen

Part of this collection strategy should include contemporaneous documentation and manuals. Throughout my project at NDSR, I’ve relied on manuals and documentation in order to get software running and to understand what the software is meant to do. Without this documentation, having copies of the software would prove of limited value. Furthermore, the marketing and packaging material for software can have historical importance itself. Any collection strategy for born digital materials should consider what analog material also needs to be collected so that future historians, archivists, and researchers can properly contextualize and assign meaning to a piece of software.

It is important the the collection strategy is well-documented and added to the wide collection strategy documents at the institution. Communicating the importance of collecting software or software-based materials is an essential part of beginning a software preservation program. An institution relies on many employees to acquire, care for, and provide access to its materials. Creating accessible documentation about software collecting and engaging in an open dialogue about collecting software is an important aspect to creating a sustainable program.

After software is added to an institution’s collection strategy as is applicable, there are many other questions and issues to attend to. Moving forward, it will be vital for institutions to get software into a digital repository and off of volatile tangible media like floppy disks and CD-ROMS. If the first step is to collect software, the second step is to save the bytes. However, simply adding software to the collections strategy will ensure that the institution, in the future, will have materials to preserve and showcase in whatever manner it decides best suits its resources, audiences, and needs. Without a comprehensive collection strategy, future actions, projects, and programs will be severely limited.

Writing for The Signal

So, I wrote a longer blog post for The Signal, and it went up today. You can view it here.

It reviews some of what I’ve written on this blog, but generally expands on the process of inventorying and describing software.