π€ AI Summary
Imitation learning (IL) policy design involves a vast, unexplored design space encompassing feature encoding schemes, neural architectures, and optimization paradigms. Method: We propose X-IL, the first full-stack modular and plug-and-play IL framework, enabling flexible substitution of feature encoders, diverse backbone architectures (e.g., Transformer, Mamba, xLSTM), and generative optimization methods (e.g., score matching, flow matching). Contribution/Results: We systematically evaluate hundreds of configurations across mainstream robotic IL benchmarks. Our analysis reveals novel high-performing combinations surpassing state-of-the-art results and uncovers principled performance trade-offs among components. To foster reproducibility and standardization in IL policy engineering, we open-source a comprehensive configuration library and an empirical analysis guide with validated implementations.
π Abstract
Designing modern imitation learning (IL) policies requires making numerous decisions, including the selection of feature encoding, architecture, policy representation, and more. As the field rapidly advances, the range of available options continues to grow, creating a vast and largely unexplored design space for IL policies. In this work, we present X-IL, an accessible open-source framework designed to systematically explore this design space. The framework's modular design enables seamless swapping of policy components, such as backbones (e.g., Transformer, Mamba, xLSTM) and policy optimization techniques (e.g., Score-matching, Flow-matching). This flexibility facilitates comprehensive experimentation and has led to the discovery of novel policy configurations that outperform existing methods on recent robot learning benchmarks. Our experiments demonstrate not only significant performance gains but also provide valuable insights into the strengths and weaknesses of various design choices. This study serves as both a practical reference for practitioners and a foundation for guiding future research in imitation learning.