S2M-Trek: From Single to Multi-Sphere Transport via Per-Frame Deep Sets on a Wheel-Legged Robot

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This work addresses the challenge of synchronously transporting multiple freely rolling balls with a wheeled quadrupedal robot in unstructured environments, where indistinguishable ball identities and inter-frame permutation symmetries hinder policy learning. To overcome this, the authors propose the Per-Frame Deep Sets (PFDS) architecture, which applies permutation-invariant pooling independently to each frame within a historical sequence before temporal modeling, thereby strictly enforcing frame-level symmetry. They prove that PFDS can universally approximate continuous G-frame invariant policies. Integrated with TactSet—a compact, symmetric perceptual representation distilled from tactile contact graphs—PFDS achieves 100% success in five-ball transport tasks in simulation, significantly outperforming MLPs, branched encoders, and conventional Deep Sets. Moreover, TactSet effectively substitutes privileged state observations without performance degradation.
📝 Abstract
We study the problem of scaling dynamic loco-manipulation from a single free-rolling sphere to multiple spheres transported simultaneously on the back of a wheel-legged quadruped, without fences, grippers, or mechanical stops. Multiple identical free-rolling spheres form an unordered set with no persistent identity: their ordering may change independently at each history frame, creating a \emph{per-frame permutation symmetry} that standard history-concatenation set encoders do not explicitly enforce -- these encoders impose only a shared, diagonal permutation symmetry over the full history. We show that this symmetry mismatch leads to a concrete failure mode in curriculum-based reinforcement learning. Within the same PPO training budget, flat MLPs and branch-wise encoders plateau at or below the two-sphere stage, while a history-concatenation Deep Sets baseline (\HCDS) fails to progress past the two-sphere stage in our runs unless ball-to-slot assignments are randomised during training, suggesting that it exploits slot indices as a curriculum shortcut rather than learning identity-free multi-sphere dynamics. We propose \textbf{Per-Frame Deep Sets (\PFDS)}, which performs permutation-invariant pooling within each history frame before temporal readout; we prove that \PFDS is $\Gframe$-invariant and universally approximates continuous $\Gframe$-invariant policies. A $2{\times}2$ ablation over encoder architecture and slot randomisation separates the architectural and data-augmentation pathways, and \PFDS reaches the five-sphere stage with 100\% no-drop transport in simulation across all five random seeds. We further distill the \PFDS teacher into \TactSet via DAgger, replacing privileged sphere-state observations with a $16{\times}16$ Boolean union contact map, yielding a compact and naturally $\Gframe$-invariant tactile representation.
Problem

Research questions and friction points this paper is trying to address.

multi-sphere transport
per-frame permutation symmetry
loco-manipulation
wheel-legged robot
identity-free dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Per-Frame Deep Sets
permutation symmetry
loco-manipulation
wheel-legged robot
tactile representation
🔎 Similar Papers
2024-07-16Neural Information Processing SystemsCitations: 16
Z
Zong Chen
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
X
Xuebin Li
School of Mathematics, Harbin Institute of Technology
J
Jinpeng Xiao
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
S
Shaoyang Li
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
B
Ben Liu
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
M
Min Li
School of Mechanical Science and Engineering, Huazhong University of Science and Technology
Zhouping Yin
Zhouping Yin
Professor of Mechanical Science and Engineering, Huazhong University of Science and Technology
Electronical ManufacutringDigital Modelling
Yiqun Li
Yiqun Li
Institute for Infocomm Research, A*STAR
computer visiondeep learningaugmented reality