Using Small Language Models to Reverse-Engineer Machine Learning Pipelines Structures

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the limitations of existing approaches for automatically extracting machine learning (ML) pipeline structures, which often rely on manual annotations or suffer from insufficient generalization to keep pace with the rapid evolution of the ML ecosystem. The work presents the first systematic evaluation of small language models (SLMs) for reverse-engineering ML pipelines and proposes an SLM-based method for their automatic identification and reconstruction. Through comprehensive comparative experiments across multiple SLMs and rigorous statistical validation using Cochran’s Q, McNemar, and Pearson’s chi-squared tests, the authors demonstrate that the best-performing SLM significantly outperforms current methods and exhibits robustness across diverse classification schemes. This approach uncovers finer-grained patterns in data science practices and overcomes longstanding bottlenecks in scalability and domain adaptability inherent in traditional techniques.

Technology Category

Application Category

📝 Abstract

Background: Extracting the stages that structure Machine Learning (ML) pipelines from source code is key for gaining a deeper understanding of data science practices. However, the diversity caused by the constant evolution of the ML ecosystem (e.g., algorithms, libraries, datasets) makes this task challenging. Existing approaches either depend on non-scalable, manual labeling, or on ML classifiers that do not properly support the diversity of the domain. These limitations highlight the need for more flexible and reliable solutions. Objective: We evaluate whether Small Language Models (SLMs) can leverage their code understanding and classification abilities to address these limitations, and subsequently how they can advance our understanding of data science practices. Method: We conduct a confirmatory study based on two reference works selected for their relevance regarding current state-of-the-art's limitations. First, we compare several SLMs using Cochran's Q test. The best-performing model is then evaluated against the reference studies using two distinct McNemar's tests. We further analyze how variations in taxonomy definitions affect performance through an additional Cochran's Q test. Finally, a goodness-of-fit analysis is conducted using Pearson's chi-squared tests to compare our insights on data science practices with those from prior studies.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning Pipelines

Code Understanding

Pipeline Structure Extraction

Data Science Practices

Small Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small Language Models

Reverse-Engineering

Machine Learning Pipelines