Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Selective disassembly of key components (e.g., RAM, CPU, HDD) from end-of-life desktop computers faces challenges including strong structural heterogeneity, high operational precision requirements, and significant task uncertainty. Method: We propose a hybrid end-to-end control framework integrating a vision-language-action (VLA) model with a rule-based controller. Leveraging a custom-built electronics disassembly dataset, we fine-tune OpenVLA and OpenVLA-OFT—marking the first application of VLA models to complex electronic device disassembly. Contribution/Results: While pure VLA models perform well in coarse-grained initial operations, they exhibit high failure rates in fine-grained subtasks. In contrast, the hybrid strategy achieves fully autonomous, end-to-end disassembly, demonstrating feasibility and robustness. This work identifies current VLA models’ precision limitations in dexterous manipulation and establishes a scalable methodological paradigm for robotic recycling in sustainable manufacturing.

Technology Category

Application Category

📝 Abstract

Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.

Problem

Research questions and friction points this paper is trying to address.

Automating disassembly of variable end-of-life desktops for critical components.

Addressing sequential, precise operations lacking in current robotic generalization.

Exploring vision-language-action models for complex disassembly tasks' feasibility.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned VLA models for robotic disassembly steps

Hybrid strategy combining VLA with rule-based controller

Custom dataset for RAM and CPU disassembly training

🔎 Similar Papers

No similar papers found.

Authors to Follow