🤖 AI Summary
This study addresses the challenge of inefficient cross-site and cross-organizational reuse of industrial spare parts, hindered by decentralized storage, inconsistent naming conventions, and missing information. To overcome these issues, the authors propose PhRAG, a novel framework that integrates multitask generative language modeling with named entity recognition and hybrid retrieval-augmented generation (RAG) to construct a unified virtual spare parts pool (VSPool) from heterogeneous unstructured data. The approach achieves robust structured information extraction under data-scarce conditions and enables natural language querying with generative, interpretable retrieval. Experimental results demonstrate that PhRAG outperforms conventional NER methods in technical specification extraction and significantly enhances both the efficiency of spare parts reuse across organizations and the transparency of the overall system.
📝 Abstract
Maintenance organizations in manufacturing try to avoid downtime and unnecessary purchasing by reusing existing assets, but the main obstacle is not a lack of parts but a lack of actionable visibility across sites and partners. Inventories are distributed, described with inconsistent naming conventions, and contain duplicates and partially specified references, so the right part often exists somewhere but remains effectively undiscoverable. The paper proposes PhRAG, a hybrid Retrieval-Augmented Generation for Pooling this fragmented landscape into a Virtual Stock Pool (VSPool) that can be structured and searched as a single resource. Unstructured, heterogeneous spare part descriptions are structured via Named Entity Recognition (NER) into a shared virtual pool dataset and indexed to support robust retrieval even when users express needs in natural language rather than exact technical specifications. The proposed modular pipeline leverages the multitasking nature of generative language models to cover two dimensions that make industrial parts pooling challenging: (i) unstructured technical specifications from diverse data sources (e.g. new partners, catalogs, marketplace listings) are handled through an offline extraction and (ii) request variability at runtime (references, partial references, specifications, price/condition constraints) is handled through a hybrid RAG-based search engine capable of retrieving relevant components and justifying results. The framework demonstrates the potential of generative approaches compared with traditional NER approaches in the presence of data scarcity for technical specifications extraction and overcomes the opacity of standard information retrieval systems by generating justifications for retrieved components. The project's open-source code can be found at https://github.com/roccofelici/vspool.