Over the last year, archivists at Emory University’s Manuscript, Archives, and Rare Book Library (MARBL) processed thirty-seven 3.5” floppy disks included amongst the papers of writer and activist Alice Walker. Early in this process, forensic images of the disks’ contents were captured and these bit-by-bit replicas of the data were stored as preservation copies. Digital Archives literature has focused on imaging as a practical solution for the transfer of data from aging and often obsolete media, and this article documents attempts at MARBL to put these recommendations into practice.
In 2007, the Manuscript, Archives, and Rare Book Library (MARBL) at Emory University acquired the papers of writer and activist Alice Walker. Amongst the collection were a number of born digital materials, including thirty-seven 3.5” floppy disks. Labels on the disks provided some information about their content, indicating that they contained a mix of literary drafts, correspondence, speeches, and writings connected to Walker’s work as an activist around the world.
Walker’s floppy disks joined a steadily growing collection of digital media and hardware already in MARBL’s collections, a trend in which MARBL is certainly not alone. In a survey of special collection libraries conducted by OCLC in 2010, seventy-nine per cent of respondents reported having born digital materials among their collections. In the four years since OCLC published these findings, this number is likely to have increased.
Amongst members of the archival community, this has raised some questions about what is happening to born digital materials once they have entered collecting institutions. Ben Goldman, for example, worries that archives are facing a growing “‘disk-in-a-box’ problem”; incoming born-digital materials stored in boxes, often indefinitely, for sheer lack of knowing what else to do with them. Unfortunately, digital media left in this way is increasingly likely to suffer from bitrot and degradation. In the case of floppy disks, the magnetic media storing the data can demagnetize over time or be lost as a result of the increasing ineffectiveness of the adherent that holds it in place. Alternatively, dust and dirt can contaminate and damage the surface of the disk, either destroying the data it contains or rendering it inaccessible. Transferring data off of such vulnerable media is, therefore, a key step towards its preservation.
Useful guidelines and resources designed to help archivists transfer data from aging, often-obsolete media are available, but the idiosyncrasies of old disks often lead to unexpected obstacles. The following article documents the various ways in which archivists at MARBL approached the Alice Walker disks, describing both the challenges and the successes. It was an experience that involved a great deal of careful trial and error, based on current best practices in the field of digital archives.
The Imaging Process
Borrowing from techniques used in digital forensics, archivists at MARBL decided to capture forensic images of the disks from the Alice Walker collection. This creates an exact copy of the disk’s content at the bitstream level, thereby ensuring no loss of data during transfer. From a preservation perspective, this makes forensic imaging ideal; an exact copy of the original data can be ingested into a secure and long-term storage environment. Derivative copies can later be used for continued appraisal and processing.
Using a Modern USB Floppy Disk Drive
Initial imaging attempts used FTK Imager, a free tool distributed by AccessData as an alternative to their subscription-based product FTK ToolKit. FTK Imager offers less functionality than FTK ToolKit in terms of post-imaging appraisal, but does allow users to create forensic images and view the captured data, either by mounting the image or by accessing it through FTK Imager’s user interface. An external 3.5” floppy disk drive was connected to the workstation running FTK Imager via the Tableau Ultrablock USB write blocker, which prevents any changes being made to the data in the process of imaging. FTK Imager is quite easy to use; users point the software to the connected disk drive, select their choice of file format for the image, and enter some basic metadata that records details of the capture. At MARBL, the AFF file format was chosen for disk images because it is non-proprietary and packages the image with related forensic metadata. Once imaging is complete, the software calculates MD5 and SHA-1 checksums in order to verify that the capture was successful. Its output, following a successful capture, includes the image file, a text file including some technical metadata and fixity information, and a CSV file listing the file names and paths of data contained on the imaged media. This additional metadata, automatically generated as part of the imaging process, makes FTK Imager a particularly useful tool.
Unfortunately, FTK Imager was only able to capture images of a handful of the floppy disks from the Alice Walker papers. When unsuccessful, the software frequently failed to recognize the external floppy disk drive or became unresponsive. Neither was this problem necessarily remedied by removing the problem disk and replacing it with another; on several occasions, an unreadable disk seemed to corrupt the entire imaging process, requiring that the drive be switched off and disconnected from the computer workstation, and the software restarted. Only then would another otherwise readable disk be recognized. As might be expected, this slowed the imaging process substantially.
Further attempts were made using another imaging tool, Acronis Backup & Recovery. As the name suggests, this is a proprietary product designed to help small businesses backup their data through the capture of forensic images. However, imaging attempts using this tool and the same configuration of hardware as had been used with FTK Imager were no more successful.
During the acquisition process, archivists at MARBL had learned that Alice Walker was a Macintosh user. The formatting of floppy disks varies depending on the operating system, which often means that a disk formatted for an Apple computer—as Walker’s likely were—can only be read by an Apple computer. Knowing this, a second attempt at imaging the remaining disks was conducted using a Macintosh workstation and its native imaging software, Disk Utility. Again, an external 3.5” floppy disk drive was connected to the workstation via the Tableau Ultrablock USB write blocker. Like FTK Imager, Apple’s Disk Utility can be used to capture bitstream images of digital media. Unlike FTK Imager, however, it generates no metadata during that process. This approach to imaging the thirty-seven floppy disks was only marginally more successful than earlier attempts using FTK Imager on a Windows workstation. Having worked through the entire set of disks using Disk Utility, only two additional images had been captured.
Because of the lack of metadata produced by Disk Utility, extra steps were required following the successful capture of these two disk images. MD5 and SHA-1 checksums were generated for each image using the command line and Mac’s Terminal application. These were stored in an Excel spreadsheet, along with details about the image capture—such as the date, hardware and software configuration used, and archivist responsible. Another disadvantage to Disk Utility is that it will only create images in Apple’s proprietary DMG file format. This became problematic as processing of the disk images continued using FTK Imager, which does not support the DMG format, to view and extract files.
Using a Modern Floppy Disk Drive with Integrated Write Blocking Functionality
The use of a Macintosh workstation for imaging, as opposed to a Windows workstation, barely reduced the number of floppy disks that remained to be captured. This suggested that incompatible operating systems were not the sole cause of earlier difficulties. Research into possible alternative causes, which consisted largely of conversation with archivists working at other institutions, turned up information about external floppy disk drives with integrated write blocking functionality. Sources suggested that the use of such dual-purpose hardware could help the imaging process by reducing the number of intermediaries between the disk drive and the imaging software. Unfortunately, these drives proved difficult to come by. Digital Intelligence has discontinued their line of 3.5” floppy disk drives with internal write blocker, but MARBL was able to purchase one from their remaining stock. Due to the drive’s software requirements, a new workstation running a 32-bit version of Windows XP was configured for use with the Digital Intelligence drive. FTK Imager was downloaded to this workstation and the imaging process was reattempted with the remaining Alice Walker disks. This approach was more successful, resulting in the capture of ten disk images. There were also fewer instances where FTK Imager became unresponsive.
Using Contemporary Hardware and Operating Systems
At this stage, images of thirteen floppy disks had been successfully captured, but twenty-four remained. Knowing that the disks had likely been used with an Apple computer dating from the 1990s or 2000s, a next step was to try imaging on a contemporary machine. The hope was that a machine dating from the same period as the digital media would be better equipped to read it. Two such machines were available: a Power Macintosh G4, dating from the early 2000s, and a Power Macintosh 7300/200, dating from 1997.
Of the two, the G4 was easier to set up because it is recent enough to be compatible with current peripheral devices. Initial attempts began with this machine using the same configuration of hardware—a 3.5” floppy disk drive connected via the Tableau Ultrablock USB write blocker—as had been used previously at other workstations. The G4’s operating system, Mac OS 9.2, came equipped with an earlier version of Disk Utility, called Disk Copy, which was used in this fourth attempt to capture forensic images of the remaining disks. Unfortunately, this approach was no more successful than earlier attempts had been.
One advantage to the 7300/200 was its original, internal 3.5” floppy disk drive. As had been the case with the 3.5” floppy disk drive with internal write blocker, this was seen as advantageous in that it reduced the level of mediation required between the imaging software and the disks. Setting up the workstation for use required some extra effort however; the Digital Archives unit worked closely with the library’s IT department to locate the peripheral devices, cables, and adaptors needed for its operation. Like the G4, the 7300/200 was equipped with Mac’s early imaging application Disk Copy running on Mac 0s 9.There were two disadvantages to capturing disk images using this machine. Using an internal floppy disk drive on an older machine resulted in one less layer of write blocking protection and meant that the archivist had to rely entirely on the 3.5” floppy disks’ write blocking tabs. This was by no means a serious obstacle, but did require some extra care during the imaging process. More significant were difficulties related to how images could be transferred off of the 7300/200, which has no USB port, following successful capture. Had attempts at imaging using this machine been more productive, it is likely that a solution to these challenges could have been found. As it happened, only a few of the twenty-four remaining disks could be captured using the 7300/200 and focus consequently shifted to another piece of hardware newly acquired by Digital Archives at Emory: the KryoFlux.
Using the KryoFlux
Further research into the recovery of data from 3.5” floppy disks had uncovered the KryoFlux, a floppy disk controller card developed by the Software Preservation Society, as a potential solution to the problems encountered so far. Documentation and reviews about the KryoFlux confirmed that USB floppy disk drives and relatively modern PCs, of the sort used in earlier imaging attempts, were often incapable of supporting many of the less common or relatively less persistent floppy disk formats. In contrast, users of the KryoFlux can indicate the correct format of the disk in question and the KryoFlux will capture an image in that format. In the process, the KryoFlux verifies that each track does indeed comply with whatever format was selected and notifies the user in the case of a mismatch. In cases where the disk format cannot be determined, users can select KryoFlux’s STREAM format, which makes no attempt at interpreting the data prior to imaging, but rather reads data as an uncompressed raw stream generated via flux transitions on the magnetic media. The resulting image is not as easy to work with as it would be in a known format, but does at least preserve the data. Another advantage to the KryoFlux, according to the available literature, is its ability to better handle media suffering from degradation or bitrot.
Purchase of the KryoFlux includes a 3.5” floppy disk drive and all of the necessary cables, and, while connecting the board to a computer requires some care and attention, it is not an overly complex process. The KryoFlux comes equipped with write blocking functionality, triggered by the removal of a jumper on the board itself. Once connected, the KryoFlux’s software can be run either via the command line or through a GUI, which provides some visualizations of the imaging process in progress and allows the user to review its level of success via a color-coding system. Through some trial and error, the formatting of the remaining floppy disks was determined and attempts at capturing forensic images of the last twenty-four disks were finally met with success. The KryoFlux creates IMG files, which are supported by FTK Imager. These IMG files were then viewed using FTK Imager, which was also used to generate file-level metadata.
Over the course of imaging these thirty-seven 3.5” floppy disks, we experimented with seven different approaches to capturing forensic images of which four had been at least partially successful. Of those four, Disk Utility proved impractical because of its limitations in terms of image file format. The remaining three were FTK Imager, used with two different configurations of hardware, and the KryoFlux. Despite its tendency to crash, FTK Imager was able to capture a number of images and generated useful accompanying metadata. The KryoFlux’s success lay in its compatibility with multiple floppy disk formats, resulting—where all previous attempts had failed—in the capture of images from all of the remaining disks. No single approach provided a complete solution; figure 1 provides a breakdown of each method’s advantages and disadvantages. However, having come out of this experience with an improved understanding of when and why each approach might work best leaves MARBL far better prepared to deal with data on floppy disks in the future.
In total, it took roughly nine months to successfully image all thirty-seven floppy disks. This included the time needed to acquire, set up, and learn how to use the necessary hardware and software. While this kind of work will likely always involve some element of exploration and flexibility, it is hoped that the experiences described here will help to significantly reduce the amount of time needed for imaging aging media in the future.
|IMAGING SOFTWARE/HARDWARE CONFIGURATION||ADVANTAGES||DISADVANTAGES|
|AccessData FTK Imager and external 3.5” floppy disk drive||
|Apple Disk Utility and external 3.5” floppy disk drive||
|AccessData FTK Imager and external 3.5” floppy disk drive with integrated write blocking functionality||
|Apple Disk Copy and Power Macintosh 7300/200||
|BitCurator (Guymager) and external 3.5” floppy disk drive||
In spite of the difficulties and delays, this was a productive process. Working with Walker’s floppy disks revealed the truly idiosyncratic nature of aging media and highlighted the necessity of persistence and patience. One result of using the KryoFlux, for example, was its reinforcement of digital media and hardware as mechanical objects capable of mechanical failures. Various mechanical features, like the head alignment of the drive, can have a significant impact on how easily the drive reads a disk. Images of a single disk captured using the KryoFlux varied widely in quality depending on the drive used. Which is not to say that a particular drive was necessarily better or worse than others, but rather that certain drives are better equipped than others to read disks of a particular format. This demonstrates a good reason for having a number of different floppy disk drives on hand. Of course, procuring older drives is something of a challenge these days, but EBay, surplus stores, and local computer enthusiast groups can be good places to start when looking to build supplies.
The significance of a floppy disk’s formatting also became increasingly clear over the course of imaging. While there were other factors that seemed to affect the success of imaging, the formatting of the disk appeared ultimately to have the greatest significance in determining whether or not an image could be captured. Unfortunately, the format of a floppy disk is not always obvious. Nevertheless, through a combination of online resources and trial and error, it was possible to identify the formats of all of the disks imaged using the KryoFlux. Understanding this importance was also incredibly useful as it helped to inform decisions about how best the archivist might capture images of problem disks. This will doubtless prove to be invaluable as Digital Archives at Emory embarks on future projects like this one.
With that said, it should be emphasized that formatting alone should not necessarily be seen as the sole factor in determining the success of an imaging attempt. Dogged persistence was, in a number of cases, key to the imaging process, especially during attempts that used FTK Imager. A failed attempt at imaging was by no means an indication that second or third attempts would be met with the same results. Sometimes a second successful attempt could be achieved immediately following a failure; sometimes packing everything up and trying again after a space of one or two days proved more fruitful. Either way, experience showed that initial failure was not necessarily a sign of an unreadable disk and often simply a sign of a temperamental disk.
What the Alice Walker collection demonstrates very well is that no one approach provides a single one-stop solution for imaging aging digital media. Over the course of working with the thirty-seven disks in the collection, failure to capture successful forensic images required ongoing research and the continuous development and application of new methods. While, in this case, some of those methods were largely unsuccessful, they have still been documented within the Digital Archives department and will very likely be tried again when imaging floppy disks or other digital media in the future. Using a contemporary machine, for example, or a Macintosh as opposed to a Windows machine, might prove to be the key to success in future imaging endeavors in spite of that not having been the case here. One product of this work has been a documented workflow designed to guide archivists through the several approaches to imaging currently identified and explain in the process the advantages and disadvantages of each, in addition to the order in which each approach should be tried. This workflow developed out of a recognition that aging digital media is idiosyncratic and requires a multifaceted and flexible approach if work with them is to be successful. It is, without doubt, a living document that will continue to grow and change as archivists, both at Emory and beyond, continue to explore and develop new ways of capturing and preserving data stored on aging digital media.
A short while after the imaging of the Alice Walker disks had been completed, Digital Archives at MARBL began exploring the BitCurator suite of tools as another option for the imaging and processing of born digital materials. BitCurator is a collaborative project led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH), which aims to package together a number of digital forensics tools—including an imaging tool—for use within the archives profession. In the time since imaging of the Alice Walker disks was completed, only a handful of floppy disks have been imaged at MARBL using BitCurator. This limited experience has suggested that in many ways, Guymager provides much of the same functionality as FTK Imager. Using a combination of Guymager and other tools available within the BitCurator suite, an archivist can achieve a level of automated metadata creation very similar to that generated by FTK Imager. Guymager can also produce a number of image file formats, including AFF. One of the main advantages to BitCurator overall is that it is being developed specifically for and in response to the archives profession. As such, it is an extremely promising tool for the field. From a narrower and more imaging focused perspective, one advantage to BitCurator is its integrated write blocking software. This added functionality can take the place of write blocking hardware if necessary, thereby removing an intermediary layer sitting between the floppy disk and Guymager that can on occasion disrupt the imaging process.
Barrera-Gomez, Julianna, and Ricky Erway, Walk This Way: Detailed Steps for
Transferring Born-Digital Content from Media You Can Read In-House (Dublin,
OH: OCLC, 2013), http://oclc.org/content/dam/research/publications/library/2013/2013-02.pdf.
Born-Digital Program @ Stanford University Libraries, last modified February 6, 2013,
Dooley, Jackie M., and Katherine Luce, Taking Our Pulse: The OCLC Research Survey of
Special Collection and Archives (Dublin, OH: OCLC, 2010), http://www.oclc.org/content/dam/research/publications/library/2010/2010-11.pdf?urlm=162945.
Erway, Ricky, You’ve Got to Walk Before You Can Run: First Steps for Managing Born-
Digital Content on Physical Media (Dublin, OH: OCLC, 2012),
Garfinkel, Simson L. “Providing Cryptographic Security and Evidentiary Chain-of-
Custody with the Advanced Forensic Format, Library, and Tools,” International
Journal of Digital Crime and Forensics 1, no. 1 (2009): 1-27.
Gengenbach, Martin J., “ ‘The Way We Do It Here’: Mapping Digital Forensics
Workflows in Collecting Institutions.” Master’s thesis, University of North
Carolina at Chapel Hill, 2012.
Goldman, Ben. “Bridging the Gap: Taking Practical Steps Toward Managing Born-
Digital Collections in Manuscript Repositories,” RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage 12, no. 1 (2011): 11-24.
Lui, Gough. “Project KryoFlux—Part 2: Why Bother With It?” Gough’s Tech Zone. Last
modified April 21, 2013, http://goughlui.com/?p=3014.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.