KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN&RWKV

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of diffusion models in robotic manipulation and their reliance on computationally heavy UNet architectures by proposing KAN-We-Flow, a lightweight and efficient visuomotor policy backbone that uniquely integrates RWKV with Kolmogorov–Arnold Networks (KAN). The method introduces an RWKV-KAN module to enable effective context propagation and nonlinear feature calibration, complemented by an action consistency regularization to enhance policy accuracy. Leveraging 3D flow matching combined with an Euler extrapolation auxiliary loss, the approach achieves state-of-the-art success rates on the Adroit, Meta-World, and DexArt benchmarks while reducing parameter count by 86.8%, substantially improving deployment feasibility on resource-constrained devices.

Technology Category

Application Category

📝 Abstract
Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8\%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{link}}
Problem

Research questions and friction points this paper is trying to address.

diffusion-based visuomotor policies
inference inefficiency
flow matching
resource-constrained robots
large UNet architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
Kolmogorov-Arnold Networks (KAN)
RWKV
Action Consistency Regularization
Lightweight Policy
🔎 Similar Papers
No similar papers found.
Z
Zhihao Chen
School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications (BUPT), China; and Beijing Hydrogen Intelligence Technology Co. Ltd., China
Y
Yiyuan Ge
School of Electronic and Information Engineering, South China University of Technology, China
Ziyang Wang
Ziyang Wang
Aston University; The Alan Turing Institute; University of Oxford
Computer VisionHealthcare AIRobotics