Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Robos News Newsroom

Editorial Desk

2026-07-03 · 2 min read

Published July 3, 2026 · Category: Robotics

Overview

arXiv:2512.22539v3 Announce Type: replace Abstract: While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation. This allows us to systematically design tasks with fine-grained difficulty levels, enabling a precise measurement of model capability frontiers. For Task Structure, VLA-Arena's 170 tasks are grouped into four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Each task is designed with three difficulty levels (L0-L2), with fine-tuning performed exclusively on L0 to assess general capability. Orthogonal to this, language (W0-W4) and visual (V0-V4) perturbations can be applied to any task to enable a decoupled analysis of robustness. Our extensive evaluation of state-of-the-art VLAs reveals several critical limitations, including a strong tendency toward memorization over generalization, asymmetric robustness, a lack of consideration for safety constraints, and an inability to compose learned skills for long-horizon tasks. To foster research addressing these challenges and ensure reproducibility, we provide the complete VLA-Arena framework, including an end-to-end toolchain from task definition to automated evaluation and the VLA-Arena-S/M/L datasets for fine-tuning. Our benchmark, data, models, and leaderboard are available at https://vla-arena.github.io.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2512.22539

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Choreographing the Way of Water: A Computational Framework for Aquatic Robotic Art

Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

BIEVR-LIO: Robust LiDAR-Inertial Odometry through Bump-Image-Enhanced Voxel Maps

Simulation Based Reward Function Validation for Multi-Agent On Orbit Inspection

Cookie Preferences