Robotics

EquiVLA: A General Framework for Rotationally Equivariant Vision-Language-Action Models

Robos News Newsroom

Editorial Desk

2026-06-19 · 2 min read

EquiVLA: A General Framework for Rotationally Equivariant Vision-Language-Action Models

Published June 19, 2026 · Category: Robotics

Overview

arXiv:2606.19784v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have emerged as a powerful paradigm for generalist robot manipulation, yet they lack geometric inductive biases: policies trained at specific orientations require substantially more data to generalize across rotational configurations. We present \textsc{EquiVLA}, the first general framework for end-to-end $\mathrm{SO}(2)$-equivariant VLA models, applicable to any architecture coupling a frozen vision-language backbone with a flow-matching Diffusion Transformer action head. \textsc{EquiVLA} introduces \textsc{EquiPerceptor}, which produces approximately $\mathrm{SO}(2)$-equivariant visual representations from frozen ViT features; and \textsc{EquiActor}, an exactly $\mathrm{SO}(2)$-equivariant flow-matching Diffusion Transformer action head. Together, they establish an approximate $\mathrm{SO}(2)$ equivariance chain from camera observations to predicted action sequences. Instantiated on GR00T~N1.5 and evaluated across four LIBERO suites, CALVIN ABCD$\to$D, and five real-robot tasks on Mobile ALOHA, \textsc{EquiVLA} achieves $92.6\%$ average success on LIBERO (vs. $78.1\%$ baseline), an average sequence length of $4.03$ on CALVIN (vs. $3.45$), and improves real-robot success from $54\%$ to $72\%$.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.19784

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

EquiVLA: A General Framework for Rotationally Equivariant Vision-Language-Action Models

EquiVLA: A General Framework for Rotationally Equivariant Vision-Language-Action Models

Overview

Source

Related Articles

Related Stories

EquiVLA: A General Framework for Rotationally Equivariant Vision-Language-Action Models

Overview

Source

Related Articles

Related Stories

Playful Agentic Robot Learning

3D Scene Graphs: Open Challenges and Future Directions

Temporal Self-Imitation Learning

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

Cookie Preferences