Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

Eval-Actions: Fine-Grained Execution Quality Evaluation for Robotic Manipulation

Robos News Newsroom

Editorial Desk

2026-06-30 · 2 min read

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2601.18723v2 Announce Type: replace Abstract: Although Vision--Action (VA) and Vision--Language--Action (VLA) policies have advanced robotic manipulation, their evaluation remains dominated by binary success rates, which obscure process-level differences among executions that complete the same task. We introduce Eval-Actions, a diagnostic evaluation methodology and real-robot benchmark for fine-grained execution-quality assessment of learned manipulation policies. Eval-Actions combines criteria-based Expert Grading (EG), Rank-Guided (RG) labels that align measurable motion indicators with expert rankings, and Chain-of-Thought-style (CoT) annotations that explain observable quality differences. The benchmark contains 13K+ teleoperated and policy-generated real-robot episodes covering 150+ tasks and approximately 52 hours of recordings with RGB-D videos, robot-state trajectories, task descriptions, and success/failure labels. Its densely annotated subset provides EG/RG/CoT supervision for training and evaluation. We further provide AutoEval, a reference multimodal evaluator that predicts quality scores, task outcomes, and diagnostic explanations from RGB temporal evidence and compact kinematic summaries. On the annotated Eval-Actions test split, AutoEval-S achieves Spearman rank correlations (SRCCs) of 0.81 and 0.84 under EG and RG, with success detection accuracies of 90.6% and 91.0%; AutoEval-P reaches 0.70 SRCC under CoT. Analyses of expert consistency, physical-metric baselines, modality ablations, structured generalization, and offline policy ranking show that Eval-Actions provides standardized, interpretable diagnostic signals complementary to success-rate evaluation.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2601.18723

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Eval-Actions: Fine-Grained Execution Quality Evaluation for Robotic Manipulation

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Sonair ADAR One 3D ultrasonic sensor is now safety-certified

ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control

Multi-UAV Formation Cooperative Obstacle Avoidance and Adaptive Shape Deformation Control in Complex Environments Based on BI-APF-RRT and Affine Transformation

HUMEMBR: Learning Human Routines for Predictive Embodied Navigation

Cookie Preferences