[2510.19495] Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

7 1 minute read

251019495 Using Non Expert Data to Robustify Imitation Learning via Offline.png

[Submitted on 22 Oct 2025 (v1), last revised 25 Oct 2025 (this version, v2)]

Authors:Kevin Huang, Rosario Scalise, Cleigh Winston, Ayush Agrawal, Yunchu Zhang, Rohan Baijal, Marcus Grotz, Byron Potts, Benjamin Burchville, Masha Itkina, Parth Shah, Abhishek Gupta.

View PDF of the article Using Offline Data to Enhance Imitation Learning through Offline Reinforcement Learning, by Kevin Huang and 10 other authors

View PDF HTML (beta)

a summary:Imitation learning has proven effective in training robots to perform complex tasks beyond specialized human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, which limits the ability to adapt to a variety of real-world object configurations and scenarios. In contrast, non-specialized data—such as operating data, suboptimal demos, partial task completions, or suboptimal policy releases—can provide broader coverage and lower collection costs. However, traditional imitation learning methods fail to use this data effectively. To address these challenges, we posit that with the right design decisions, offline reinforcement learning can be used as a tool to harness non-specialized data to enhance the performance of imitation learning policies. We have shown that although standard offline RL methods can be virtually ineffective in leveraging non-specialized data under sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow the use of such data, without significant additional assumptions. Our approach shows that extending policy distribution support can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing significantly enhanced recovery and generalization behavior. In processing tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when incorporating non-specialized data. Furthermore, we show that these approaches are able to leverage all collected data, including partial or suboptimal demonstrations, to enhance the performance of task-oriented policies. This underscores the importance of algorithmic techniques for using non-specialized data to learn robust policies in robotics. Website: This https URL