🤖 Humanoid 🦾 Industrial & Cobot 🚚 AGV / AMR 🐕 Quadruped ⚙️ Reducers · Servos · Sensors 🚁 Drones & Autonomy 🧠 Embodied AI
Robos News
Robotics

WoVR: World Models as Reliable Simulators for Post-Training VLA Policies with RL

arXiv:2602.13977v2 Announce Type: replace Abstract: Reinforcement learning (RL) promises to unlock capabilities beyond imitation learning for Vision--Language--Action (VLA) models, but its requirement for massive real-world interaction prevents direct deployment on physical robots. Recent work attempts to use learned world models as simulators for policy optimization, yet closed-loop imagined rollouts inevitably suffer from hallucination and long-horizon error accumulation. Such errors not only

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2602.13977v2 Announce Type: replace Abstract: Reinforcement learning (RL) promises to unlock capabilities beyond imitation learning for Vision--Language--Action (VLA) models, but its requirement for massive real-world interaction prevents direct deployment on physical robots. Recent work attempts to use learned world models as simulators for policy optimization, yet closed-loop imagined rollouts inevitably suffer from hallucination and long-horizon error accumulation. Such errors not only degrade visual fidelity, but also mislead policy optimization by providing unreliable learning signals. We propose WoVR, a reliable world-model-based RL framework for post-training VLA policies. Instead of assuming a faithful world model, WoVR explicitly regulates how RL interacts with imperfect imagined dynamics. It improves rollout stability through a controllable action-conditioned video world model, reshapes imagined interaction to reduce effective error depth via Keyframe-Initialized Rollouts, and maintains policy--simulator alignment through World Model-Policy co-evolution. Extensive experiments demonstrate that WoVR enables stable long-horizon imagined rollouts and effective policy optimization, achieving superior LIBERO performance and consistent real-world gains across multiple robotic platforms. These results show that world models can serve as practical simulators for RL when hallucination is explicitly controlled. Additional visualization results are available at https://wovr-corl.github.io.

Source

Originally published at arxiv.org.

Related Articles

CD
Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →
Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Related Stories

More from News →