Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Robos News Newsroom

Editorial Desk

2026-07-03 · 2 min read

Published July 3, 2026 · Category: Robotics

Overview

arXiv:2607.02466v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models are fundamentally bottlenecked by the scarcity of expert demonstrations -- triplets of observations, instructions, and actions that are costly to collect at scale. We argue that this bottleneck stems from conflating two distinct learning objectives: acquiring physical competence (how to move) and acquiring semantic alignment (what to do). Crucially, only the latter requires language supervision. Building on this Decomposition Hypothesis, we propose Task-Agnostic Pretraining (TAP), a two-stage framework that first learns transferable motor priors from cheap, unlabeled interaction data -- including discarded off-task trajectories and autonomous robot play -- via a self-supervised Inverse Dynamics objective. A lightweight second stage then grounds these priors in language using minimal expert data. On the SIMPLER benchmark, TAP matches models trained on over 1M expert trajectories while using orders of magnitude less labeled data, yielding a 10% absolute gain over standard behavior cloning. On a real-world WidowX platform, TAP retains 25% success under camera perturbations where internet-scale baselines collapse to 0%, demonstrating that task-agnostic pretraining produces robust, transferable physical representations and offers a scalable path forward for Embodied AI.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2607.02466

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Learning to Move Before Learning to Do: Task-Agnostic pretraining for VLAs

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Choreographing the Way of Water: A Computational Framework for Aquatic Robotic Art

Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

BIEVR-LIO: Robust LiDAR-Inertial Odometry through Bump-Image-Enhanced Voxel Maps

Simulation Based Reward Function Validation for Multi-Agent On Orbit Inspection

Cookie Preferences