Abstract
Large-scale digitization efforts by third-party firms are the subject of no small amount of controversy and criticism, as is especially the case with Google Books. This article reports some of the findings and important implications of a rigorous multi-year quantitative and qualitative assessment of the images representing a sizable proportion of the digital surrogates created by Google and deposited in the HathiTrust, which is one of the most important large-scale preservation initiatives to emerge in higher education in the past fifty years. The population of study described here consists of Englishlanguage books and serials published before 1923 that were scanned and processed by Google between 2004 and 2010. At the time the data for the study were gathered (2011), this population consisted of approximately 1.25 million volumes or roughly 12 percent of the HathiTrust corpus. The findings suggest that the imperfection of digital surrogates is an obvious and nearly ubiquitous feature of Google Books and that such imperfection has become and will remain firmly ensconced in collaborative preservation repositories.
© 2013 by Walter de Gruyter GmbH & Co.