🤖 AI Summary
This work proposes a lightweight, task-specific six-degree-of-freedom cinematographic robotic arm designed to overcome the high cost and operational complexity of industrial-grade film robots, which hinder their widespread adoption. By integrating a fully 3D-printed mechanical structure with a target-conditioned visuomotor imitation learning framework, the system achieves autonomous camera motion control without explicit geometric programming. Leveraging the Action Chunking with Transformers (ACT) algorithm for end-to-end imitation learning, the robot operates at a total hardware cost under $1,000, supports a 1.5 kg payload, and attains a repeatable positioning accuracy of 1 mm. It accurately reproduces and generalizes diverse cinematic camera trajectories, representing the first sub-$1,000 high-precision robotic system capable of professional-grade cinematography.
📝 Abstract
Robotic camera systems enable dynamic, repeatable motion beyond human capabilities, yet their adoption remains limited by the high cost and operational complexity of industrial-grade platforms. We present the Intelligent Robotic Imaging System (IRIS), a task-specific 6-DOF manipulator designed for autonomous, learning-driven cinematic motion control. IRIS integrates a lightweight, fully 3D-printed hardware design with a goal-conditioned visuomotor imitation learning framework based on Action Chunking with Transformers (ACT). The system learns object-aware and perceptually smooth camera trajectories directly from human demonstrations, eliminating the need for explicit geometric programming. The complete platform costs under $1,000 USD, supports a 1.5 kg payload, and achieves approximately 1 mm repeatability. Real-world experiments demonstrate accurate trajectory tracking, reliable autonomous execution, and generalization across diverse cinematic motions.