RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
arXiv:2506.17639v2 Announce Type: replace Abstract: Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and strong potential in complex robotic manipulation. However, their large parameter sizes and high inference latency hinder real-world deployment, especially on resource-constrained platforms. To address this, we conduct a systematic empirical study of model compression for VLAs. Building on these insights, we present \textit{RLRC}, a three-stage compression and rec
RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models
Overview
arXiv:2506.17639v2 Announce Type: replace Abstract: Vision-Language-Action models (VLA) have demonstrated remarkable capabilities and strong potential in complex robotic manipulation. However, their large parameter sizes and high inference latency hinder real-world deployment, especially on resource-constrained platforms. To address this, we conduct a systematic empirical study of model compression for VLAs. Building on these insights, we present \textit{RLRC}, a three-stage compression and recovery pipeline consisting of structured pruning, performance recovery via SFT and RL, and subsequent quantization. The RL stage incorporates a critic warm-up strategy and BC loss regularization to stabilize training and preserve policy behavior. RLRC achieves up to an 8 times memory reduction and 2.3 times inference speedup while maintaining the original task success rate. Extensive experiments across multiple VLA backbones show that RLRC consistently outperforms existing compression baselines, highlighting its effectiveness for on-device deployment. Project website: https://rlrc-vla.github.io
Source
Originally published at arxiv.org.
Related Articles
- Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System
- VL-MemKnG: Hybrid Memory with a Spatio-Temporal Knowledge Graph for Question Answering over Long Egocentric Navigation Trajectories
- VISTA: Scale-Aware Visual Navigation via Action History Conditioning
Source: https://arxiv.org/abs/2506.17639