Brief NDSR Project Update

Over the past couple of months, I’ve been chipping away at my NDSR project, and I figure it is time to give an explicit update on my progress.

First, I’ve got a working Apple II and I’ve been running interactive tutorial software on it. Being that I’m not a medical doctor, I’ve failed every tutorial – apparently I have no idea how hyperthermia works. But the other librarians and I have had fun competing anyway. Below, you’ll see our current reigning champion:

CUGPlcMW4AAsRpx.jpg large

Next, I’m nearing a completed version of my white paper on the current status of software preservation. While the paper is available to the NLM community for comment, it will not be publicly available until comments have been received and considered. So, you can look forward to that!

I presented a poster at iPRES 2015 and had a wonderful time doing so. The conference was incredibly informative, and I enjoyed being able to contribute. Here, you’ll find an image of the poster I presented. If you have any questions about it, please don’t hesitate to get in touch.

Finally, I’m preparing a presentation at NLM to show off some of the materials we’ve acquired in the project so far (including RFP’s from the 1960’s, legacy software, and educational materials from the 1970’s forward). For the presentation, I also prepared a list of suggested further reading and other projects to check out. While some of the articles are more in-depth than others, I think many of these links will be of interest to a wider audience. So, please, take a peak:

Articles, Research Reports, and Recorded Lectures

On Going Projects

  • Olive Executable Archive, ongoing research project on emulation and software preservation at Carnegie Mellon University
    • Currently in testing at Stanford University Libraries
  • EaaS: Emulation as a Service, ongoing project on emulation at the University of Frieburg
    • Currently in use at Yale University Libraries
  • Video Game Play Capture Project at The Strong National Museum of Play
  • Recomputation.Org, project for the University of St. Andrews that works to preserve software in order to ensure the ability to recompute computational science experiments over at least a 20 year period
  • Astrophysics Source Code Library (ASCL), project for the University of Maryland that registers academic source code from Astrophysicists and provides unique ID’s for citation; working on the possibility of archiving the code as well

Super Fun Online Examples

  • Theresa Duncan CD-ROMS, project from Rhizome that makes interactive CD-ROMS from the artist Theresa Duncan available online through Eaas.
  • Ageny Ruby, early web AI by Lynn Hershman Lesson commissioned by the San Francisco Museum of Modern Art as part of a larger art project on computers and romance, including the film Teknolust
  • The Internet Archive’s Software Collection, includes early games as well as other types of software run through an emulation (JSMESS)

Data-Centric Decision Making: Assessing Success and Truth in the Age of the Aglorithm

The New York Times published an article about the state of white collar workers at Amazon last weekend, and everyone has been abuzz about it since. I know I’m a bit late to this discussion, but I do have a day job. Anyway, the article portrays a toxic environment, where workers are pressured to work through health crises and are encouraged to tattle on each other through online apps. There is a lot to talk about in the article, and honestly, I’m not sure how much new ground I am going to cover. However, I feel the need to take this opportunity to talk about data-centric decision making, what it means for labor, and what it means for our understandings of truth, fact, and assessment. That is, of course, a lot to discuss, and I will not be able to adequately cover all of these ideas, but with the article in the public sphere right now, it seems like a good time to discuss what ‘data’ can mean and what it can mean in relation to preexisting power structures.

First and foremost, Amazon employs data-centric management. Productivity is calculated and shared. An employee’s performance is directly tied to quantifiable actions – the number of Frozen dolls bought, the number of items left un-purchased in the cart, etc. Being good at your job, in this situation, isn’t just about being competent; it is about constantly proving competence through data. This idea is not terrible at its root. For a couple of reasons, I kind of dig it. According to this 2005 study, creating quantifiable criteria can help managers avoid discriminating against women in hiring practices. Creating specific data points, in this situation, can help managers ensure that they view candidates in an equitable manner. I am sure you could find more examples where differing to ‘data’ or quantifiable criteria for job performance helps people who have been historically discriminated against in the workplace.

But, what happens when you stop paying attention to how that criteria is quantified? What happens when ‘data’ becomes code for ‘objective truth’ rather than what it is – a human constructed method for measuring reality that can fall prey to discriminatory practices in the same way as other modes of assessment? An algorithm is not a God-given measurement of truth; it is subject to the same prejudices and flaws as any other human invention.

I will give you an example to demonstrate how this this phenomena can occur. Safyia Umoja Noble, a professor at UCLA’s Information Studies Department, has written extensively on how search engines portray women of color. In an article for Bitch Magazine, she describes what happened when her students searched for ‘black girls’ online. SugaryBlackPussy.com was the first result. To be clear, the students did not mention porn in their search. This website was the first result for the simple query, ‘black girls.’ She describes similar results for ‘Latina girls,’ and other women of color. How could this happen? Should porn really be the first result for ‘black girls’? What about Google’s algorithm determined porn to be the most relevant search result?

After a quick thought, the answer is obvious. Google’s algorithm seems to take popularity into consideration when sorting results, meaning that if more people click on porn, then porn is higher in the results. Of course, Google’s algorithm is proprietary and secret, but this assumption does not seem to be outlandish. There is a lot to be said about what using popularity as a criteria for search engine results means for the representation of minorities, but it is best to read Noble’s work to get a thorough discussion of those matters. You’ll learn a lot more that way than if I try to summarize her work for you. Instead, I would like to make a simple point: the search engine is not infallible. It is a human-designed device that reflects preexisting human priorities. When these priorities are problematic, so are the search results.

In the ‘information age,’ or whatever we are calling it at this point, what this means is that preexisting priorities can be amplified in a way that they were not before. More people see these results, and more people can be influenced by them. The search engine does not necessarily provide truth, accuracy, or expertise. Instead, it can provide a magnified version of the problematic, inaccurate, and hurtful representations that have been created and enforced overtime. The algorithm is not truth; the algorithm is media.

Back to Amazon. It may feel like I’ve ventured from that New York Times article, but I haven’t really. As with a search engine, the methods of measuring worker productivity are as subject to human fallibility as other ways of measuring output. As humans create new ways to measure success, those measurements will reflect preexisting notions of what success means and what a successful person looks like. When this phenomena is hidden behind a perceived objectivity in numerical assessment, it is more difficult to argue against. When a manager can simply point to the ‘data’ rather than having an in-depth conversation about what worker output should look like, the workers themselves are left at a loss. In order to participate in negotiations at this level, workers either need to have a high-level understanding of how the criteria for success is quantified or they need to excel within the manager’s data-centric assessment system. And, clearly, excelling in a manager-designed assessment system may not best serve the needs of the worker.

There is a lot more to talk about here, but it will have to wait for another day. I’ll just leave you with one suggestion: it is time we stop asking to see the numbers and start asking to see the math.

Vocabulary Forensics, Digital Media, and Technological Change

I’ve created a new game, and while I’m sure only a few people will find it fun, I think the game illustrates a wider issue in preserving digital media and technological tools. The game, most simply put, is to guess the shelf-life of a word.

Slang comes in and out of style pretty frequently. People aren’t saying, “That’s haaawt,” the same way that there were in 2002 under Paris Hilton’s dubious influence. But technical language also falls in and out of use, based largely on the object and infrastructures which the vocabulary is tied to.

Let me illustrate with a quick example: “smartphone.” While the lexical construct of “smart – object” has retained a fair bit of influence with things like “smart-fridges,” the word “smartphone” may not be long for this world. When is the last time a Verizon commercial touted the amount of smartphones they offered? At this point, at least within a certain demographic, we just call them phones. The consumer-level technology has advanced in such a way that using the term “smartphone” is redundant. The internet-connected phone is no longer noteworthy; the flip-phone is.

In this way, the use of the word “smartphone” decreased as the market’s ability to provide that object increased.At least in the United States, smartphones are so ubiquitous that we have largely dropped the “smart” and are back to simply “phone.”

This observation is relatively simplistic. Of course language changes to reflect the lived, human environment, and technology is a key aspect of that environment. I am a little personally astounded by how quickly the word came into and fell out of use (a little over 10 years by my count), but this isn’t terribly interesting on its own. It does, however, illustrate an issue when undertaking a digital preservation project, particularly one that focuses on preserving software, like my current work at the National Library of Medicine. As I familiarize myself with obsolete technology, I also need to familiarize myself with obsolete language. Let’s just say that the word for “back-end” does not seem to be as stable as one may have assumed.

Introducing…

Hello! My name is Nicole Contaxis, and I’m currently in Washington DC as part of the National Digital Stewardship Residency. More on that later.

In general, I’m interested in digital preservation, the semiotics of computation and digital media, and the history and rhetoric of technology. Or, more clearly said, I spend most of my time thinking about how we communicate and the ramifications of both the content and nature of that communication. Some of this blog will cover these types of issues and will deal with technology and history more widely.

However, many blog posts will deal with my current project as a part of the National Digital Stewardship Residency. I am working at the National Library of Medicine on a project titled, “NLM-Developed Software as Cultural Heritage.” What that means is that I’m trying to track down all of the software developed at NLM and design a preservation strategy for it. Considering that NLM has a 40 year history of developing software for internal needs and for their users, I’ve got my work cut out for me. Nevertheless, I’m pumped. Software is a huge part of the lived experience, and I’m excited to play my part in ensuring long-term access to executable files.