This article presents opportunities for the use of Google Analytics, a popular and freely available web analytics tool, to inform decision making for digital archivists managing online digital archives content. Emphasis is placed on the analysis of Google Analytics data to increase the visibility and discoverability of content. The article describes the use of Google Analytics to support fruitful digital outreach programs, to guide metadata creation for enhancing access, and to measure user demand to aid selection for digitization. Valuable reports, features, and tools in Google Analytics are identified and the use of these tools to gather meaningful data is explained.
Introduction
Recent scholarship focused on the intersection of libraries and web analytics has highlighted the value of web analytics tools in the enhancement of online library services and content. Web analytics has been traditionally been defined as “the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage.”[1] By using web analytics to understand web users’ interaction with library web content, information professionals can make more informed decisions regarding the development of digital library services and platforms. Web use data can inform site design, selection of content, and promotion and outreach strategies; it can also justify digital library expenditures by demonstrating evidence of use of web-based services and content. Given the value of web analytics for information service providers, information professionals must encourage a more widespread practice of web data gathering and analysis amongst the profession to ensure sustained success of digital library initiatives.
Web analytics provide unique opportunities for archivists developing and maintaining freely available digital archives and repositories. These online, open-access collections differ from traditional proprietary resources that are limited to a specific library user community. As Loftus discusses, the use of page views as an indicator of success is problematic for traditional library institutions as their potential user base is more limited [2]. Freely available digital repositories, however, seek to connect all potentially-interested web searchers with relevant archival assets. As a result of this critical difference in target audience, managers of library websites providing access to proprietary resources to a defined community and managers of freely available digital archives and repositories will likely approach the interpretation of web use data with different aims. Given the large potential audience and the high research impact ceiling for these freely available repositories, archivists have significant, unique opportunities to use web analytics to facilitate the use of and maximize the research value of their digital collections by driving traffic to their content.
This article will provide four practical examples of applied data analysis that uses Google Analytics, a freely available web analytics tool. These examples, pertaining specifically to freely accessible online archives and repositories, will describe several of the many ways that archivists can expand use and discovery of digital archival assets by web users. How can the effectiveness of digital outreach initiatives be evaluated and optimized? How can archivists uncover new digital outreach opportunities? How can the effectiveness of descriptive metadata in facilitating asset discovery be improved? How can digital archivists make more informed selection decisions in digital collection building? Each of these four topics will be addressed through the definition of relevant Google Analytics metrics, an explanation of methods for extracting and analyzing data gathered using the Google Analytics interface, and proposed strategies to use this data analysis to increase the use and visibility of a digital archival repository’s materials. Each suggested strategy will be supported with brief examples from the author’s analysis of web use data for the Ball State University Digital Media Repository (DMR), a freely available CONTENTdm-based digital archives developed by the Ball State University Libraries [3].
About Google Analytics
Released in 2006, Google Analytics is a freely available, user-friendly web analytics solution developed by Google that provides users with highly interactive, flexible, and detailed web analytics reporting [4]. Through Google Analytics’ main web-based interface, the user seeking to gather and analyze data builds a query based on specific metrics, defined as “individual measurements of visitor activity on your site,” and dimensions, which “break down metrics across some common criteria, such as country or browser. [5]” Google provides users with a wealth of robust yet accessible supporting documentation, including a detailed reference page outlining various metrics and dimensions that are available through the Google Analytics interface, demystifying web analytics language for beginning users [6]. The Google Analytics installation and set-up process is thoroughly and clearly described in Google’s supporting documentation for the Analytics service [7]. The creation of a Google Analytics account provides the would-be Analytics user with a tracking code snippet to be placed in the web page or web page template’s HTML document.
The service separates its standard reporting features into broad categories that group various reporting options by function. Audience reports allow a Google Analytics user to obtain data about the users of the web site being tracked, including their geographic location, the duration of their visits, the number of pages accessed in a visit, and the hardware and software they use to access the site. Acquisition reports allow Google Analytics users to obtain data regarding the ways in which users discover and access the site’s content, including a list of web search engine searches that brought users to the site and data regarding sites that referred users to the site directly. Behavior reports provide data regarding user behavior once the use has arrived on the site, illuminating, for example, which pages on the site are most heavily viewed and which pages on the site most frequently serve as a site’s entry point [8]. In 2011, Google Analytics added a Realtime component to its interface which allowed for immediate access to data to support timely monitoring of site performance [9].
Literature Review
A substantial body of information science literature documents the use of web analytics tracking services, including Google Analytics, to analyze the use of library web pages and digital libraries; some research in this area has been expanded into the field of archival studies as well. The literature in this field thoroughly documents the value of web data analysis and the process of developing a web analytics program at a library institution. Much of the published scholarship on the topic of web analytics in libraries has focused on the use of data to improve site usability and user experience. Sumner et al. documented the planning of a web metrics strategy for the National Science Digital Library, identifying goals for the initiative as well as potential challenges [10]. Khoo et al. present a general overview of web metrics providers, valuable metrics, and factors affecting implementation, as well as brief implementation case studies involving four digital libraries specifically focusing on session length data. Khoo writes that session length data is a problematic measure of visit quality in the context of libraries, as some visitors may be looking to recall one item quickly while others may pursue extensive research [11]. Fang et al. analyzes factors such as connection type and speed, browser type, screen resolution, page visits, and geographic statistics to better understand the Rutgers-Newark Law Library’s user base and demonstrate that increasing the prominence of certain pages further promotes their use by digital library users [12].
Black investigates the analysis of web log data at The Ohio State University Libraries’ to provide basic information regarding web user interaction, technology, and content preferences [13]. Paul investigates the implementation of Google Analytics on the University of Missouri’s library webpages, reporting on the library’s willingness to implement changes to links on pages but its reluctance to consider more widespread, large-scale changes based on data gathered [14]. Loftus discusses the use of Google Analytics by the Health Sciences Libraries of the University of Minnesota, including the use of custom variable to track data from more complex sites. His analysis explains how the institution used web use data to inform decisions to feature underused pages more prominently and how subsequent follow-up data analysis showed that the redesign achieved its aims [15]. Hess summarizes the functionality and installation of Google Analytics, highlighting useful features of the services and more complex data extraction strategies, and reports on the use of Google Analytics to evaluate the effectiveness of the Illinois Harvest Portal discovery layer [16].
Betty describes the implementation of Google Analytics to study use of online library screencast tutorials at Regis University. Metrics regarding the user software and connection speed allowed for analysis of the accessibility of the screencast videos [17]. Wagner and Arendt document the implementation of Google Analytics-based data analysis program to analyze use of the Morris Library at Southern Illinois University Carbondale website and inform a site redesign. They discuss challenges in interpreting Google Analytics data to draw meaningful conclusions about web use and to support design change [18]. Turner emphasizes the importance of establishing performance goals for a library website and recommends using Google Analytics goal conversion features to evaluate the successes of these goals [19].
Members of the archival community have also produced scholarship on the usefulness of web analytics data in the managing of online archival content. Prom provides an in-depth study of the use of Google Analytics at the University of Illinois Archives to provide data regarding use of online finding aid content. Prom’s study focuses on the use of data to inform site redesigns to improve user experience and encourage greater user engagement [20]. O’English describes the use of Google Analytics to study how library patrons used and discovered HTML archival finding aids at the Manuscripts, Archives, and Special Collections at the Washington State University Libraries, highlighting the importance of web traffic in finding aid visibility and discoverability [21]. Ament-Gjevick studies the use of multiple tools for web data gathering, including Google Analytics and Quantcast, and demonstrates the value of using referral site metrics to understand web use and web use trends to enhance accessibility [22].
The Value of Discoverability and Visibility
Tracking the research amongst the information service community regarding the use of web analytics clearly shows that scholarship in this area is firmly rooted in studies conducted by librarians who study and manage traditional online library sites that provide a defined, limited community access to access to mainly-proprietary resources. These studies primarily focus on enhancing the experience of their community’s online users by analyzing web use data to inform site redesign and improvement. Members of the archival community who manage or study freely available digital repositories and have explored the value of web analytics have demonstrated the influence of many of the aforementioned researchers through their emphasis on internal site performance and experience. In addition, they have also realized that, due to the uniqueness of their repositories, emphasis on online discoverability is also significant when studying analytics.
However, a review of research in this area demonstrated a significant lack of scholarship dedicated to the role of web analytics in developing a program specifically intended to increase the visibility, discovery, and use of digitized archival assets amongst a broad audience of web users. The literature has also vastly ignored the value of web analytics in informing the development of web outreach programs using social media and other Web 2.0 outlets. Efforts to maximize the visibility and use of their online materials are of great significance and value to the digital archivist. Visibility-raising efforts have the ability to increase the research impact and relevance of digital archives. However, digital archives are competing with a plethora of online materials and information for the interest and visitation of users who would be potentially interested in the subject matter documented. Thus, emphasis on discoverability is not only beneficial but essential in assuring the cultural relevance of digital archives.
The four questions and subsequent discussion below address four significant opportunities to increase the visibility of and traffic to digital archival objects that can be informed and evaluated through the analysis of data provided by Google Analytics. Each of the questions will be addressed through the description of specific queries in Google Analytics that can provide relevant data, the definition of metrics relevant to the query, and discussion of useful and productive analysis of the data returned from the queries. Discussion will also include relevant examples from the author’s analysis of web use data for the Ball State University DMR.
How can the effectiveness of digital outreach initiatives be evaluated and optimized?
The exploration of social media sites and the development of Web 2.0 content to promote library materials and services have been thoroughly explored through both recent scholarship and practice in the library and archival communities. Through the use of Google Analytics, digital archivists can gather data documenting the amount of web traffic directed towards a digital asset or page via social media outlets and other external sites and use this data to analyze the success of digital outreach initiatives and to improve the effectiveness of these efforts.
The Acquisition reporting section of Google Analytics provides access to valuable metrics describing the specific ways in which users discover a site’s content. The All Referrals report documents traffic referred to a site via links on other websites, listing the referring source (the website from which the visitor came) and the number of visitors directed from that site over any period of time. Other valuable data regarding the length and depth of a visit, the percentage of new visitors attracted, and the bounce rate (the percentage of single-page visits [23]) is also presented for each individual source [24]. Each source can be individually selected and isolated, presenting a report that visualizes the number of visitors referred via that exact source for any specific, selected time and breaks down the traffic via that source into multiple, unique referral paths (the specific page on the referral site from which the traffic came [25]), if applicable.
Data gathered from this report can be used to increase traffic to a site in several ways. The data can determine which referral sites are generally most fruitful in directing traffic to a digital library, allowing archivists to spend more resources promoting their materials via the most fruitful sources. The data can also show which specific instances of a particular type of outreach were most successful, allowing archivists to hypothesize about the factors that make a particular outreach instance attractive or successful. In promoting assets from the Ball State University DMR, Google Analytics data was used in both of these ways to inform digital outreach efforts to maximize traffic.
At Ball State University, the use of Wikipedia has been explored as a means of raising the visibility and discoverability of digital archival objects in the Digital Media Repository. The efforts were heightened in 2011 when 144 links to DMR objects from relevant, already-existing Wikipedia pages with links being typically added to the External Links, References, and Further Reading sections of Wikipedia articles. Google Analytics was used to track the number of visitors who arrived at the DMR via Wikipedia. The data gathered from the All Referrals report demonstrated that, from the months of September to December, 2011, 5,663 visitors arrived to the DMR via Wikipedia, representing 11.82% of our total traffic. This data was very encouraging and justified the continued use of Wikipedia to raise the visibility of the DMR and its assets.
Data regarding the number of visitors who arrived via specific Wikipedia articles was available by selecting the general Wikipedia source in the All Referrals report. This data helped identify which types of Wikipedia articles attracted the most visitors to the DMR, showing that links to digitized historic sheet music and links to published, freely accessible works of scholarship tended to be especially high sources of traffic. Using this data, the adding of links to these two types of materials was further explored.
How can archivists uncover new digital outreach opportunities?
Online outreach efforts initiated by librarians and archivists are only one component of the sum total of digital promotion that can raise the visibility of digital archival materials. Third party sharing, in which online users discover archival materials and link to and discuss them on any number of Web 2.0 or social media platforms, can be crucial in reaching new audiences for online archives. The web contains a plethora of communities and sites that would provide unique, creative targeted digital outreach initiatives, many of which are unexplored. By using the All Referrals report in Google Analytics, digital archivists can have access to data documenting third party sharing and its impact on web traffic. By analyzing the list of sources in this report, archivists can identify sites and online communities that were especially fruitful traffic sources; this information can be used to develop archives-generated outreach efforts that can further raise the visibility of digital archival content.
At Ball State University, using Google Analytics to document third-party sharing shed light on new digital outreach opportunities for DMR assets. In the spring of 2012, the DMR published a collection of over 800 digitized historic photographs taken by photographer and editor Ralph Satterlee documenting the Indianapolis 500 from 1960 to 1974 [26]. Within a year, links to materials from this collection were posted to four separate auto racing message boards, mb2501.proboards.com, indycarmodeling.proboards.com, trackforum.com, and forums.autosport.com/, by a third-party user. In total, links posted on these four traffic sources accounted for the discovery of DMR assets from this collection by 370 users from May 1, 2012, to May 1, 2013.
The discovery of the fruitfulness of these third-party activities provided evidence of the value of sports message boards as a venue for digital archives promotion. In the fall of 2012 and winter of 2013, the author posted links to bsufans.com, a Ball State University sports message board, promoting digitized films of historic Ball State football and basketball games that had been made available in the DMR. During the same one-year period, the message board directed 184 users to the DMR.
How can the effectiveness of descriptive metadata in facilitating asset discovery be improved?
Descriptive standards have traditionally played a singular role in determining how digital archival assets are described and what descriptive information should be created to facilitate access. However, as Schaffner points out, digital librarians and archivists must consider user needs and behaviors when describing digital objects. “One of several core competencies that special collections metadata librarians must have is ‘a keen understanding of users’ needs and preferences,’” she writes. “This is especially important now that discovery happens in multiple environments. Librarians and archivists need to manage archival collections by provenance, but also must describe what is in the collections for their users [27].” Google Analytics provides digital archivists with an opportunity to gain an understanding of user needs by offering data regarding search engine queries that connected users with relevant digital archives content. This data can inform Search Engine Optimization (SEO), efforts to raise the discoverability of assets via a search engine, allowing archivists to describe digital assets to facilitate increased use.
Web traffic data provided in case studies by both O’English [28] and Prom [29] demonstrate the overwhelming significance of search engine traffic in the use of digitized archival content. By utilizing the Organic Keywords report in Google Analytics, digital archivists can see a list of web search engine searches that successfully brought web users to digital archives content. This report is presented in the Acquisitions reporting section. By accessing the content of searches that connected web searchers with archival resources, archivists can better learn how to describe assets in ways that makes them discoverable to a greater number of potential users.
At Ball State University, the Organic Keywords report was used to inform the research and creation of descriptive metadata for a large, growing collection of photographs documenting Ball State’s history that were digitized and added to the DMR. As the collection grew and more assets were added to the DMR, this report was used to evaluate the effect of researching names of individual persons in the historic photographs and adding them to the object metadata on web search engine traffic to the digitized photographs. To isolate the keywords that specifically produced hits on the Ball State University Campus Photographs collection, a more complex process was required. In the Behavior reporting section, the Landing Pages report (organized under Site Content) was selected, which shows a list of all of the first pages in each user’s session [30]. Keyword was selected as a secondary dimension, showing the search engine query that led users to the specific landing pages when applicable. The filter was then used to isolate landing pages only from the Ball State University Campus Photographs collection. From June 1, 2013 to September 1, 2013, keyword data was available for 388 site entries via the Campus Photographs collection through search engine traffic. Of these 388 visits, 137 (35.31 %) were the result of a search engine query that involved an individual person name. This data suggests that researching and providing names of individuals pictured in photographs, when possible, had a significant impact on the visibility of and traffic to digitized historical photographs in this collection.
How can digital archivists make more informed selection decisions in digital collection building?
As investment in digital archives initiatives continues, the selection of materials to make available online will remain a pivotal aspect of the management and growth of online archival repositories. By tracking and analyzing web analytics data, archivists can gain access to information to support informed decision making regarding selection for digitization. Gertz writes, on the matter of selection criteria, that “demand from users is vital. Digitizing and mounting materials publicly is a form of publishing, and success in publishing means knowing and targeting the audience [31].” Archivists must not simply extrapolate digital demand from in-house use, as the user base for in-person patrons and online users varies greatly. Instead, they must seek to discover hard data to measure the demand for digitized archival assets.
Google Analytics’ Behavior reporting section offers reports that provide data regarding the specific pages that are most heavily visited on a website. The All Pages report (organized under Site Content) provides a list of all pages on a site that have been viewed, the number of page views that each page has received over any specified time (with a page view being defined as any instance of the loading of a page by a user [32]), and other data regarding a user’s interaction with that page [33]. Pages can be viewed with the unique page URL or the page title as the primary dimension. The Landing Pages report provides similar data, but isolates only the pages through which a user first entered the site. This data can provide archivists with a thorough understanding of which items and collections are most heavily used and which are most successful in drawing users to the repository. The advanced filter option provides digital curators with the dexterity needed to isolate certain subsets of material based on common URL strings or page title elements.
At Ball State University, the All Pages and Landing Pages reports helped identify the most heavily-used collections in the DMR and shaped an understanding of user demand. In turn, we used this knowledge to inform our select decisions, seeking to expand our repository’s offerings in areas that had exhibited high use. The use of the All Content and Landing Page reports showed that the Hague Sheet Music collection, a DMR collection containing over 100 pieces of digitized popular sheet music from mainly the nineteenth and twentieth centuries, was one of the most heavily used collections in the repository. Although the DMR contains over 100,000 items, during the 2012 calendar year, of the 1,118,044 total page views logged on the DMR, 91,963 of them were recorded on Hague Sheet Music pages and objects. Assets from this collection were isolated in Google Analytics using the Advanced Filter feature. These archival sheet music objects were often discovered through Google and other search engines and were also easily integrated into our Wikipedia outreach strategy. The data gathered regarding these materials suggested the existence of a significant demand for historic sheet music amongst our web users, and as a result, digitization has begun on other sheet music collections in the Ball State University Archives & Special Collections to further meet this demonstrated demand.
Conclusion
Research has documented that Google Analytics can provide valuable data to enhance the understanding of a digital library user’s experience and implement internal site improvements. However, the service also provides valuable data to inform the development and refinement of a digital outreach strategy to maximize the visibility of digital archives assets and to guide the description and selection of archival materials to promote widespread discovery and use of archival content. The features demonstrated provide archivists and digital library curators with easily-accessible, flexible data reports that can enhance archivists understanding of the relationship between content and web users, both existing and potential. Exploration of web analytics should be encouraged as an activity in which archivists should engage as knowledge of users and content is elemental to decision-making, collection building, and outreach in archives. Through this practice, archivists can access the data needed to support decision-making necessary to maximize the research impact and relevance of digital archives.
[…] Using Google Analytics Data to Expand Discovery and Use of Digital Archival Content. (2017). Practical Technology for Archives. Retrieved 22 June 2017, from https://practicaltechnologyforarchives.org/issue1_szajewski/ […]