Starting is Half the Battle: Collecting as the First Step to Software Preservation

Quick note: This blog post will also be posted on The Future of Information Alliance (FIA) website.

Image 3
Example of the types of software developed at the National Library of Medicine

As the National Digital Stewardship Resident (NDSR) at the National Library of Medicine, I am currently devising a software preservation pilot strategy. What this strategy entails is the repository ingest of software materials held on obsolete media, the description of said materials, and the creation or digitization of contextual materials. Complicating this project is the fact that there has not been a comprehensive collection strategy for software at NLM, and many documents and copies of software have been lost over time. With this in mind, the first, and perhaps most important step, is for institutions to include software and software documentation as a part of their pre-existing collection strategies.

Software preservation is, quite simply, the attempt to make software usable many years in the future. Although it is possible to save the bytes of a piece of software, providing access to it and making it usable is more difficult. Because software relies on complex technical infrastructures in order to operate properly, future users may not be able to interact with software in a meaningful way if an institution only saves the bytes. For a software program to function, it needs to be installed on the correct operating system, on the correct hardware, and with any necessary ancillary programs or code libraries also installed. For a preservationist, this can be a nightmare.

Emulation is one way to deal with these complex dependencies, and recent attempts to make emulation an easier option for libraries and archives have made immense gaines. There are a variety of services – EaaS and The Olive Archive – that help people emulate computing environments and assist libraries and museums with the vast technical dependencies of software programs. While this technology is a big step forward for the field and for access to software-based materials, it does not constitute a complete response to the need for software preservation. Before getting to the point of emulating materials, an institution needs to address how it will collect software.

The process of devising and implementing a collection strategy for software and related materials can be daunting even for institutions whose collections are already closely aligned with software, computing, and technology history. Regardless, a comprehensive collection strategy is the first and most important step to preserving software. Without a collection strategy, it is more likely than not that the software will be lost before an institution has managed to device and implement larger strategies for futures users to access that software, either through emulation or another tool.

A proper collection strategy for software based materials should reflect the larger collection goals of the institution. It does not make sense for an art library to begin collecting scientific software, but it does make sense for an art library to collect software that artists created or used as part of their creative process. Just as adding A/V materials to a collection strategy does not mean that an institution needs to collect all A/V materials, collecting software does not mean collecting all software. The same care and attention paid to the wider collection strategy needs to be taken when considering software acquisitions.

Presenting about software preservation at the National Library of Medicine; Photo by Ben Petersen

Part of this collection strategy should include contemporaneous documentation and manuals. Throughout my project at NDSR, I’ve relied on manuals and documentation in order to get software running and to understand what the software is meant to do. Without this documentation, having copies of the software would prove of limited value. Furthermore, the marketing and packaging material for software can have historical importance itself. Any collection strategy for born digital materials should consider what analog material also needs to be collected so that future historians, archivists, and researchers can properly contextualize and assign meaning to a piece of software.

It is important the the collection strategy is well-documented and added to the wide collection strategy documents at the institution. Communicating the importance of collecting software or software-based materials is an essential part of beginning a software preservation program. An institution relies on many employees to acquire, care for, and provide access to its materials. Creating accessible documentation about software collecting and engaging in an open dialogue about collecting software is an important aspect to creating a sustainable program.

After software is added to an institution’s collection strategy as is applicable, there are many other questions and issues to attend to. Moving forward, it will be vital for institutions to get software into a digital repository and off of volatile tangible media like floppy disks and CD-ROMS. If the first step is to collect software, the second step is to save the bytes. However, simply adding software to the collections strategy will ensure that the institution, in the future, will have materials to preserve and showcase in whatever manner it decides best suits its resources, audiences, and needs. Without a comprehensive collection strategy, future actions, projects, and programs will be severely limited.

Writing for The Signal

So, I wrote a longer blog post for The Signal, and it went up today. You can view it here.

It reviews some of what I’ve written on this blog, but generally expands on the process of inventorying and describing software.



Technological Optimism & Software Preservation

I’ve been going through Phillip Rogaway’s recent paper “The Moral Character of Cryptographic Work,” and the section on technological optimism was particularly striking. He writes, “Technological optimists believe that technology makes life better. According to this view, we live longer, have more freedom, enjoy more leisure. Technology enriches us with artifacts, knowledge, and potential. Coupled with capitalism, technology has become this extraordinary tool for human development. At this point, it is central to mankind’s mission.”

Later, he continues, “If you’re a technological optimist, a rosy future flows from the wellspring of your work. This implies a limitation on ethical responsibility. The important thing is to do the work, and do it well. This even becomes a moral imperative, as the work itself is your social contribution…a normative need [for the ethic of responsibility] vanishes if, in the garden of forking paths, all paths lead to good…. Unbridled technological optimism undermines the basic need for social responsibility.”

My first thought was a series of jokes about Max Weber. Someone more thoughtful than I needs to write a piece on the Protestant work ethic, capitalism, and the current shape of the tech industry. Maybe I will just write it when I am feeling more thoughtful, but that is for another day. As much as it would be fun to discuss how long-standing American mores affect the current shape of innovation, industry, and ethics in Silicon Valley, I am going to focus today on a topic that more closely relates to my current work at the National Library of Medicine.

One may not think that there is a direct connection between software preservation and the current prevalence of technological optimism. Preservation, of software or any other material, is simultaneously obsessed with the past and the future. A preservationist works for a future client, trying to ensure accessibility to those who may not be born yet. At the same time, the preservationist concerns herself with the past. Currently, I am digging through a pile 5 1/4 floppy disks from the private papers of a researcher. I use 30 year old manuals to try to fix hardware from the 1980’s. I’m sitting in pile of historic refuse, trying to ensure it will be usable in 2040. My relationship with time is a strange one – my thoughts, actions, and goals straddle the past and the future but barely touch the present.

The technological optimist, on the other hand, exists only (or at least mostly) in the present. If their work, as Rogaway argues, becomes a moral imperative in itself, then the past and future of that work does not have as much weight as the current status of the work itself. If all technology helps mankind and technological work is good, then why is it necessary to understand past technologies? Why is it necessary to understand the  future ramifications of our current technology? Outside of improving present technology for future use, the present-tense becomes the dominant focus when technology becomes its own moral good. Our understandings of time and how we fit into larger historical narratives effect how we relate to the world and, most importantly, the people in it. If we assume constant positive technological innovation, we assume many things about the lives of the people all around the world. If the iPhone improves my life, does the existence of the iPhone improve all lives? How will we weigh what these inevitably variant experiences?

Of course, the technological optimist, as Rogaway understands her, does not concern herself with these types of questions. Technology is progress and progress is good. Yet, as people begin to look more closely at the ethical and philosophical issues intrinsic in technological invention and use, there will need to be a historical record of technology. Software preservation is a key aspect of that historical record. Understanding software as an important historical object encourages the preservationist both to preserve software and the variety of documentation that software development and use relies on. Without documenting our technological history, we are far more likely to remain entranced with a world view that fails to consider the ramifications and nuances of technological innovation.

To be clear, this is not simply a question of understanding history so you are not doomed to repeat it. There is truth to that statement, but it is not the point. It is easy to disregard documenting software because software is intended for use. We use software until the next thing comes along and then we throw it away. However, with each piece of software we throw away, we also throw away ideas and innovations that may not have fit into contemporaneous understandings of technological progress. While a new version of a piece of software may be available, it does not mean that all aspects of that new version are actually improvements. We lose intellectual histories by disregarding software. With the loss of those histories, we also lose the ability to better discern what constitutes true growth and the role technology plays in that growth. If we are to create an understanding of technology free of the technology optimism that Rogaway describes, we need to preserve our software.

Brief NDSR Project Update

Over the past couple of months, I’ve been chipping away at my NDSR project, and I figure it is time to give an explicit update on my progress.

First, I’ve got a working Apple II and I’ve been running interactive tutorial software on it. Being that I’m not a medical doctor, I’ve failed every tutorial – apparently I have no idea how hyperthermia works. But the other librarians and I have had fun competing anyway. Below, you’ll see our current reigning champion:

CUGPlcMW4AAsRpx.jpg large

Next, I’m nearing a completed version of my white paper on the current status of software preservation. While the paper is available to the NLM community for comment, it will not be publicly available until comments have been received and considered. So, you can look forward to that!

I presented a poster at iPRES 2015 and had a wonderful time doing so. The conference was incredibly informative, and I enjoyed being able to contribute. Here, you’ll find an image of the poster I presented. If you have any questions about it, please don’t hesitate to get in touch.

Finally, I’m preparing a presentation at NLM to show off some of the materials we’ve acquired in the project so far (including RFP’s from the 1960’s, legacy software, and educational materials from the 1970’s forward). For the presentation, I also prepared a list of suggested further reading and other projects to check out. While some of the articles are more in-depth than others, I think many of these links will be of interest to a wider audience. So, please, take a peak:

Articles, Research Reports, and Recorded Lectures

On Going Projects

  • Olive Executable Archive, ongoing research project on emulation and software preservation at Carnegie Mellon University
    • Currently in testing at Stanford University Libraries
  • EaaS: Emulation as a Service, ongoing project on emulation at the University of Frieburg
    • Currently in use at Yale University Libraries
  • Video Game Play Capture Project at The Strong National Museum of Play
  • Recomputation.Org, project for the University of St. Andrews that works to preserve software in order to ensure the ability to recompute computational science experiments over at least a 20 year period
  • Astrophysics Source Code Library (ASCL), project for the University of Maryland that registers academic source code from Astrophysicists and provides unique ID’s for citation; working on the possibility of archiving the code as well

Super Fun Online Examples

  • Theresa Duncan CD-ROMS, project from Rhizome that makes interactive CD-ROMS from the artist Theresa Duncan available online through Eaas.
  • Ageny Ruby, early web AI by Lynn Hershman Lesson commissioned by the San Francisco Museum of Modern Art as part of a larger art project on computers and romance, including the film Teknolust
  • The Internet Archive’s Software Collection, includes early games as well as other types of software run through an emulation (JSMESS)

Creating Intellectual Boundaries in Complex Computing Environments: A Small Way in Which Copyright May Actually Help Software Preservation

Early on in my software preservation project, I wrote a blog post about the difficulties of researching software history. One of the issues I list is establishing the boundaries for a particular piece of software. As I have inventoried the software developed at the National Library of Medicine (NLM), I have been forced to established what constitutes a single piece of software each time I create a new record. Software goes through many versions, so naturally the preservationist needs to create temporal boundaries. In other words, they need to decide which version or versions of the software need to be preserved. However, it is far more difficult and just as necessary to decide what constitutes one piece of software within a complex system that relies on many moving parts in order to function properly over time.

At NLM, for example, developers experimented with creating a “coach” for Grateful Med. Grateful Med was a user-friendly front-end system that searched NLM’s networked databases. Named Coach Metathesuarus Browser,  this software was designed to hook into Grateful Med and provide assistance to the user when the user’s search queries returned inadequate responses. This piece of software was fully developed and tested in NLM’s Reading Room version of Grateful Med, but, for a variety of reasons, it was not adapted widely. While conducting my inventory, I decided that Coach Metathesaurus Browser constituted a separate piece of software and therefore justified its own record. My reasoning was that it had a separate institutional history from Grateful Med and that this history needed to be acknowledged. But, if Coach Metathesaurus Browser had been implemented, users may have interpreted it as a feature of Grateful Med and not as its own entity. What this means is that if Coach Metathesaurus Browser had been implemented, I would have been required to prioritize either the experience of the developer or the experience of the user in my inventory. Considering the goals of my current project, I would have made the same decision and inventoried Coach Metathesaurus Browser separately, but it is important to note that this is an decision with intellectual ramifications.

The problem of boundaries offers an unexpected view of the role of copyright in software preservation. Generally, copyright is only viewed as an obstacle for an institution or individual that wishes to preserve software. It limits what can be done to a piece of software without the consent of the copyright holder and presents a serious issue for the long term access to the cultural heritage inherent in software. Yet, copyright may benefit software preservation projects in one way. Whereas I am working with materials that are not under copyright, a preservationist working with copyrighted materials already has boundaries imposed on a piece of software. The legal structure of copyright requires a clear definition of what is in an individual piece of software and therefore protected by the law. What this means is that boundaries around a piece of software are created at the time that the software is developed and by someone affiliated with the software development project.

Because NLM is a government entity, its software is not under copyright. As I create records in my inventory, I establish what constitutes an individual piece of software, and I draw the intellectual boundaries around that software. While the history of NLM and the history of software development informs these decisions, there is not always a clear right answer. If NLM’s in-house developed software was copyrighted, I could rely on the logic of copyright and the individuals who held that copyright in order to draw these boundaries. In other words, researching the copyright of a particular piece of software may remove the need to make a decision that could cause inaccuracies or anachronisms in a record.

When compared to the obstacles that copyright creates for most of software preservation, this one possible benefit is almost negligible. Yet, highlighting this unexpected aspect of the relationship between copyright and software preservation demonstrates the ways in which an archivist is reliant on context in order to decide what belongs in a collection and how it ought to be described. Where contextual information is scarce, as is frequently the case for software development and use, copyright can provide necessary information. Although this observation is not pertinent to my current project, I will continue to make decisions about what constitutes an individual software project very carefully because I understand how these decisions may affect future perceptions of historic software and computing behavior.

Oral Histories of NLM’s Software Development: The Not Aways Smooth Process of Technological Change

As a part of my National Digital Stewardship Residency (NDSR) project, I’m interviewing staff members about their experiences developing and using software at the National Library of Medicine (NLM). Doing these interviews may be my favorite part of the job right now. Each of these staff members has had invaluable insight into the cultural and technical aspects of software development and into the history of NLM itself. Not to mention, I’ve heard a funny story or two.

Most recently, my mentor and I drove to Frederick, MD to interview a retired staff member about her work on NLM’s software products. Before I delve it the interview, I need to thank my mentor for joining me on this Thursday morning adventure and for BBQ after. My mentor is a total mensch.

We drove in Frederick to interview Rose Marie Woodsmall, who worked at NLM for thirty years and was instrumental on AIM-TWX, Grateful Med, and innumerable other projects. Throughout the first two months of my project, her name kept popping up in interviews, in documentation, and even in casual conversations with staff members. Organizing the interview, however, was a bit difficult as Woodsmall had retired and started a sheep farm in rural Maryland. Perhaps my only regret from this interview was that I did not get to meet any of the sheep. Even without the sheep’s input, the interview provided amazing background and context.

The interview was filled with anecdotes, including how Grateful Med got its name. (You’ll have to wait for a later post with that story. I have a long blog post planned about naming conventions at NLM and how that affects research and institutional culture.) Perhaps the most interesting aspect of the interview, however, was how Woodsmall was able to outline the lived experience of major technological shifts in computing and how the technological changes were not always smooth. Woodsmall frequently acted as a liaison between software users and developers on development projects, and she had to navigate relationships between many stakeholders.

The overall push of software development through the years that Woodsmall was a staff member at NLM was towards more user-friendly systems. MEDLARS I, the earliest of NLM’s computerized catalog search systems, required an expert user. Implemented in 1964, it was only searched by specially trained medical librarians, and the training was two weeks long! These medical librarians worked in libraries and hospitals around the world and helped facilitate access to NLM’s resources, even at a geographic distance. It was a big step forward in medical librarianship and medicine in general.

After MEDLARS I, however, came MEDLARS II which allowed for online search capabilities and was slightly less cumbersome to use, although not ‘user-friendly’ or ‘online’ in the ways we understand those terms today. After implementation in 1971, MEDLARS II still required specialized training, and although it was ‘online’ and allowed remote searches of databases in real time, it was not connected to the Internet. In fact, the Internet had not yet been invented. MEDLARS II relied on dedicated phone lines to establish its network connection, and computers that were connected to MEDLARS II were not necessarily connected to any other networks. Some machines were dedicated to simply searching the NLM databases. These machines did not look like computers today. Some were teletype machines, like the one pictured below.

Image provided by the Providence Public Library
Image provided by the Providence Public Library

Although MEDLARS II sounds very limited considering current search capabilities on the Internet, it was another step forward for medical librarianship. It seems like common sense to assume that everyone welcomed the switch from batch-processing to online search. After all, why not make life easier? However, as Woodsmall pointed out, the switch was not universally applauded. Some of the search analysts, accustomed to their particular place within their individual institutions, were suspicious of the new technology and were concerned about losing their sense of prestige. One librarian even hid the teletype machine used to connect to the network in a closet. She did not want her patrons to use it, and she did not want them to see her use it. Eventually, people saw its utility and the transition became smoother.

Woodsmall talked about similar issues when NLM began to implement Grateful Med widely. Many librarians, even up to the 1990’s, were nervous about allowing end-users to search a database without their assistance. In a Letter to the Editor in the Bulletin of Medical Library Associations published in January 1994, Catherine J. Green, a librarian at Bethesda Memorial Hospital in Boynton Beach, Florida, quotes a recent lecture that argues that allowing end-users to search the database on their own “is not only a dumb idea, it is a dangerous idea!” Green and others were worried that doctors would inefficiently or inaccurately search for medical information using Grateful Med, thus impacting the quality of patient care. This concern is not unfounded, and although Grateful Med is now frequently praised for helping democratize access to medical information, this concern at the time of implementation is an important aspect to the technology’s history.

The point is that change is not always universally heralded, and it is possible to forget that fact when those changes are later determined to be technological progress. The move from batch-processing to online search and then to end-user search capabilities are all seen as positive advancements in medicine and medical librarianship now. But, when these changes were implemented, they were not always seen as progress in medical librarianship. Disagreements and long discussions about the viability of these technologies cannot be overlooked because they are an integral part of the way that technological change occurred.

Interviewing someone like Woodsmall is helpful because she was able to highlight those discussions and the more contentious aspects of software development. Implementing new technologies frequently requires advocacy, both within the organization and with users. As I continue to research the history of software development at NLM, interviews will remain important as they help contextualize the technology in the disagreements, debates, and compromises that occur throughout development and implementation.

Vending Machines, Users, and My Cheez-It Problem

I have become very interested in vending machines. Vending machines have been used to sell candy, cigarettes, stamps, bicycle parts, socks, and live bait. People have even created library book vending machines, which is pretty cool.

Let’s get to the point though. The other day, I tried to get a bag of Cheez-It’s from the vending machine in the NLM canteen. The process was greatly impeded by a new digital interface, and I cannot stop thinking about the experience.

Let’s get some pictures here:

From the Minnesota Historical Society Flickr account:
From the Minnesota Historical Society Flickr account

This is obviously an older vending machine and not the one I tried to get Cheez-It’s from. It is entirely mechanical and from the 1960’s. You get your candy using levers and coin slots. I like the color, and I like the prices.

Now, here’s a picture of the one in the NLM canteen:

NLM canteen vending machine
NLM canteen vending machine

It is important to note that parts of the design have not changed. We have a large machine with a glass front and a mechanism to tell the machine what you want. What has changed is the mechanism and the exact nature of the human-machine communication.

The first thing I noticed when trying to get my Cheez-It’s was that the prices are not listed near the food. I’m a price-conscious snacker, and I found this design feature stressful. The NLM canteen vending machine is less transparent about its pricing than the old vending machine pictured earlier. I had to click the interface 3 times before I could find the price of the snack, making comparison shopping very difficult. Comparison shopping on the 1960’s vending machine only required a quick glance at the glass. In this way, clearly communicating pricing is accomplished more effectively and efficiently with less advanced technology.

Not to get too off the topic of vending machines, but I see this as part of a larger trend towards opaque pricing schemas and the overall blackboxing of technology. As our technology becomes more advanced, the exact mechanisms by which that technology succeeds are rendered more invisible, more indecipherable. As the technology becomes more convoluted, companies can take advantage of a general lack of understanding in order to price their technology goods, or the goods that their technology helps sell, in ways that a consumer would not appreciate if they better understood the exact ways in which the technology and the systems surrounding that technology function. But, I think Latour and blackboxing will be another blog post for another time. Back to vending machines!

The interface on the NLM canteen vending machine uses a grocery store cart as a design metaphor in order to help users understand how to navigate this overly complicated interface for getting snacks. At first, I thought this was strange. Yes, one buys food at grocery stores and at vending machines, but one rarely equates these two actions. I would never need a grocery cart at a vending machine, and I eat at vending machines pretty often.

Bring your cart to carry your Cheez-It's
Bring your cart to carry your Cheez-It’s

But, while ranting about this vending machine to fellow NDSR resident, Valerie Collins, (I told you I haven’t been able to stop thinking about it), she pointed out something I hadn’t considered. The shopping cart design is not taken from a grocery store; rather, that design metaphor is taken from internet shopping sites, like Amazon.

Screenshot from August 14, 2015
Screenshot from August 14, 2015

The resemblance seems pretty clear, and since the interface is digital, I am inclined to think that the designer borrowed the idea for the shopping cart from internet sites rather than from physical grocery stores.

What we see here, then, is a design metaphor taken from the physical world (grocery stores) implemented online (at Amazon and other sites) and then re-implemented on a physical machine with a digital interface. This vending machine helps illustrate a feed-back loop for design practices that, in the end, may not serve to create better designed goods. If I have not made it clear already, I hate this vending machine. I find it overly complicated, unhelpful, and generally poorly conceived. The implementation of a digital interface does not make the machine friendlier to users, and I cannot imagine it is easy to repair either.

So, the question becomes, why switch to a digital interface when it does not serve the users well? Why design with more advanced tools if those tools are not intrinsically helpful for that use case? Perhaps the new digital interface serves the needs of the company in some way that is not apparent to me. Does it produce tracking data that the company finds helpful? I don’t know. But, I will be sure to consider my experiences with the NLM canteen vending machine as I plan and design tools for library and archive users. Sometimes the best tool is not the most advanced one.

One last lesson learned: people look at you like you are crazy when you take cell phone photos of vending machines.