Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

LA4VLA: Learning to Act without Seeing via Language-Action Pretraining

Robos News Newsroom

Editorial Desk

2026-06-26 · 2 min read

Published June 26, 2026 · Category: Robotics

Overview

arXiv:2606.27295v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are commonly pretrained on robot demonstrations by jointly mapping visual observations and language instructions to actions. However, dense visual-action supervision can dominate the comparatively sparse language-action signal. As a result, policies may rely on visual shortcuts rather than learn how language conditions action execution, making them sensitive to visual variations. To address this limitation, we propose LA4VLA, a language-action pretraining framework that enables policies to acquire language-conditioned action priors without visual observations. These priors capture reusable manipulation skills shared across tasks and scenes, reducing reliance on scene-specific visual cues. Specifically, LA4VLA decomposes expert demonstration trajectories into atomic action segments and pairs each segment with a corresponding low-level action description. This yields LA4-33K, a dataset of 33K Language-Action (LA) episodes derived entirely from existing demonstrations without additional robot data collection. We further develop LA4VLA-1B, a lightweight 1B-parameter VLA model, and investigate three paradigms for incorporating language-action supervision into VLA learning: LA-only pretraining, sequential LA-to-VLA pretraining, and mixed LA-VLA pretraining. Across simulation and real-world tasks, LA-pretrained policies consistently outperform matched VLA-pretrained counterparts, while combining LA and VLA supervision leads to further gains. In particular, mixed LA-VLA pretraining improves the average success rate of LA4VLA-1B over the no-pretraining baseline by up to 17.8 and 45.0 percentage points in simulation and real-world tasks, respectively. These results establish LA4VLA as an effective and complementary pretraining strategy for building stronger and more robust VLA policies.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.27295

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

LA4VLA: Learning to Act without Seeing via Language-Action Pretraining

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?

Monte Carlo Tree Search with Tensor Factorization for Optimization Problems in Robotics

A System for Fast, Resilient, and Adaptable Loco-Manipulation Behaviors on Humanoid Robots

FC-Vision: Real-Time Visibility-Aware Replanning for Occlusion-Free Aerial Target Structure Scanning in Unknown Environments

Cookie Preferences