KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN&RWKV

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the inefficiency of diffusion models in robotic manipulation and their reliance on computationally heavy UNet architectures by proposing KAN-We-Flow, a lightweight and efficient visuomotor policy backbone that uniquely integrates RWKV with Kolmogorov–Arnold Networks (KAN). The method introduces an RWKV-KAN module to enable effective context propagation and nonlinear feature calibration, complemented by an action consistency regularization to enhance policy accuracy. Leveraging 3D flow matching combined with an Euler extrapolation auxiliary loss, the approach achieves state-of-the-art success rates on the Adroit, Meta-World, and DexArt benchmarks while reducing parameter count by 86.8%, substantially improving deployment feasibility on resource-constrained devices.

Technology Category

Application Category

📝 Abstract

Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8\%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{link}}

Problem

Research questions and friction points this paper is trying to address.

diffusion-based visuomotor policies

inference inefficiency

flow matching

resource-constrained robots

large UNet architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching

Kolmogorov-Arnold Networks (KAN)

RWKV