🤖 AI Summary
Research software is often poorly identified in scholarly publications, hindering its discoverability, attribution, and reuse. Method: This paper proposes a machine learning–based pipeline for automatic extraction of software assets, integrating repositories such as HAL and Software Heritage, and—novelty—the first application of the COAR Notify protocol to research software management to enable automated interoperability and metadata synchronization between repositories and authors. The approach strictly adheres to the FAIR principles (Findable, Accessible, Interoperable, Reusable). Contribution/Results: It supports precise identification, trustworthy archiving, and standardized citation of software entities. Experimental evaluation demonstrates significant improvements in the discoverability, attribution accuracy, and reusability of research software within academic literature, thereby advancing the formal recognition, dissemination, and long-term preservation of software as first-class scholarly outputs.
📝 Abstract
The discoverability, attribution, and reusability of open research software are often hindered by its obscurity within academic manuscripts. To address this, the SoFAIR project (2024-2025) introduces a comprehensive workflow leveraging machine learning tools for extracting software mentions from research papers. The project integrates repository systems, authors, and services like HAL and Software Heritage to ensure proper archiving, citation, and accessibility of research software in alignment with FAIR principles. To enable interoperable communication across the various systems we present an integration of the COAR Notify Protocol, which facilitates automated, interoperable communication among repositories and authors to validate and disseminate software mentions. This paper outlines the SoFAIR workflow and the implementation of the COAR Notify Protocol, emphasising its potential to enhance the visibility and credibility of research software as first-class bibliographic records.