ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses key challenges in professional software automation, including the fragility of GUI-agent visual grounding, error propagation, and the limitations of API-based approaches due to protocol heterogeneity and closed interfaces. To overcome these issues, the paper introduces COM-as-Action, a novel paradigm that leverages the Component Object Model (COM) as a unified executable abstraction, reframing software interaction as deterministic program synthesis. The contributions include ComCADBench—the first industrial-scale CAD benchmark—ComActor, a self-correcting agent designed for robust task execution, and ComForge, an extensible platform enabling large-scale training. Experimental results demonstrate that ComActor achieves state-of-the-art performance on ComCADBench, significantly outperforms baseline methods on long-horizon tasks, and successfully generalizes to external CAD environments.

📝 Abstract

Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with heterogeneous protocols and inaccessible commercial interfaces. In this work,we identify the Component Object Model (COM) as a unified executable abstraction, proposing COM-as-Action: a new paradigm that reframes professional software interaction as deterministic program synthesisrather than sequential visual control. To validate this paradigm in the most demanding environments, weintroduce ComCADBench, the first benchmark for agents operating real industrial CAD software. Ourexperiments reveal a substantial paradigm gap: frontier proprietary models achieve near-zero successunder GUI-based interaction, whereas COM-based execution yields substantial immediate gains. Tobridge the remaining gap between syntactic correctness and geometric accuracy, we develop ComActor, aself-correcting agent trained through a progressive three-stage framework, alongside ComForge, a scalableplatform for large-scale training in Windows containers. Extensive experiments show that ComActorachieves state-of-the-art performance on ComCADBench, with strong resilience in long-horizon taskswhere baselines collapse, and generalizes to external CAD benchmark.

Problem

Research questions and friction points this paper is trying to address.

software manipulation

GUI-based agents

API-based approaches

Component Object Model

professional software interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

COM-as-Action

program synthesis

computer-use agents