Automating Information Extraction and Retrieval for Industrial Spare Parts Pooling

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the challenge of inefficient cross-site and cross-organizational reuse of industrial spare parts, hindered by decentralized storage, inconsistent naming conventions, and missing information. To overcome these issues, the authors propose PhRAG, a novel framework that integrates multitask generative language modeling with named entity recognition and hybrid retrieval-augmented generation (RAG) to construct a unified virtual spare parts pool (VSPool) from heterogeneous unstructured data. The approach achieves robust structured information extraction under data-scarce conditions and enables natural language querying with generative, interpretable retrieval. Experimental results demonstrate that PhRAG outperforms conventional NER methods in technical specification extraction and significantly enhances both the efficiency of spare parts reuse across organizations and the transparency of the overall system.

📝 Abstract

Maintenance organizations in manufacturing try to avoid downtime and unnecessary purchasing by reusing existing assets, but the main obstacle is not a lack of parts but a lack of actionable visibility across sites and partners. Inventories are distributed, described with inconsistent naming conventions, and contain duplicates and partially specified references, so the right part often exists somewhere but remains effectively undiscoverable. The paper proposes PhRAG, a hybrid Retrieval-Augmented Generation for Pooling this fragmented landscape into a Virtual Stock Pool (VSPool) that can be structured and searched as a single resource. Unstructured, heterogeneous spare part descriptions are structured via Named Entity Recognition (NER) into a shared virtual pool dataset and indexed to support robust retrieval even when users express needs in natural language rather than exact technical specifications. The proposed modular pipeline leverages the multitasking nature of generative language models to cover two dimensions that make industrial parts pooling challenging: (i) unstructured technical specifications from diverse data sources (e.g. new partners, catalogs, marketplace listings) are handled through an offline extraction and (ii) request variability at runtime (references, partial references, specifications, price/condition constraints) is handled through a hybrid RAG-based search engine capable of retrieving relevant components and justifying results. The framework demonstrates the potential of generative approaches compared with traditional NER approaches in the presence of data scarcity for technical specifications extraction and overcomes the opacity of standard information retrieval systems by generating justifications for retrieved components. The project's open-source code can be found at https://github.com/roccofelici/vspool.

Problem

Research questions and friction points this paper is trying to address.

spare parts pooling

information extraction

retrieval

inventory visibility

industrial maintenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Named Entity Recognition

Virtual Stock Pool