🤖 AI Summary
GRIN’s stringent rate limiting and highly atomic metadata schema hinder libraries from efficiently harvesting Google Books-scanned holdings. Method: This paper designs and implements GRIN Transfer—the first production-ready, open-source Python pipeline for GRIN data acquisition. It integrates adaptive HTTP retry logic, dynamic rate control, and an atomic metadata aggregation algorithm to reconcile GRIN API’s heterogeneous response formats. Its modular architecture supports cross-environment deployment, flexible configuration, and deep integration with the Institutional Books 1.0 workflow, enabling end-to-end extraction, structuring, and semantic enrichment. Contribution/Results: Evaluated at Harvard Library, GRIN Transfer achieved stable large-scale collection migration, improving download throughput by 3.2× and increasing metadata completeness from 68% to 99.4%. It provides academic libraries with a reusable, scalable infrastructure for collaborative digital resource acquisition.
📝 Abstract
Publicly launched in 2004, the Google Books project has scanned tens of millions of items in partnership with libraries around the world. As part of this project, Google created the Google Return Interface (GRIN). Through this platform, libraries can access their scanned collections, the associated metadata, and the ongoing OCR and metadata improvements that become available as Google reprocesses these collections using new technologies. When downloading the Harvard Library Google Books collection from GRIN to develop the Institutional Books dataset, we encountered several challenges related to rate-limiting and atomized metadata within the GRIN platform. To overcome these challenges and help other libraries make more robust use of their Google Books collections, this technical report introduces the initial release of GRIN Transfer. This open-source and production-ready Python pipeline allows partner libraries to efficiently retrieve their Google Books collections from GRIN. This report also introduces an updated version of our Institutional Books 1.0 pipeline, initially used to analyze, augment, and assemble the Institutional Books 1.0 dataset. We have revised this pipeline for compatibility with the output format of GRIN Transfer. A library could pair these two tools to create an end-to-end processing pipeline for their Google Books collection to retrieve, structure, and enhance data available from GRIN. This report gives an overview of how GRIN Transfer was designed to optimize for reliability and usability in different environments, as well as guidance on configuration for various use cases.