Robotics

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Robos News Newsroom

Editorial Desk

2026-06-17 · 2 min read

Published June 17, 2026 · Category: Robotics

Overview

arXiv:2606.18239v1 Announce Type: new Abstract: We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulation models including $\pi_0$, $\pi_{0.5}$, XVLA, and InternVLA-A1, and reveal that models with near success rates exhibit strikingly different capability profiles: $\pi_{0.5}$ achieves the highest test success rate and the best train--test retention, whereas InternVLA-A1 dominates mobile manipulation but collapses on dexterous tasks, and XVLA exhibits strengths on a disjoint set of atomic skills compared to other policies. Beyond capability profiling, EBench analyzes the generalization ability from 4 representative perspectives, identifying the impact of different distribution shift factors. The results reveal strengths and weaknesses of models behind an overall score. We hope this benchmark offers a broad set of diagnostic signals to guide iteration on generalist manipulation models.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.18239

Robos News Newsroom

Robos News reports on robotics research, components, manufacturers, field deployments, and industrial automation worldwide. Tip our newsroom: [email protected]

Email the newsroom →

Reporting standard: Product specifications, deployment counts, and performance claims are attributed to their source. Safety-critical decisions should be based on the applicable technical documentation and validation for the operating environment.

Cookie Preferences

Overview

Source

Related Articles

Related Stories

KUKA deploys Automation Management Platform for North American automakers

FCC robot ruling shines a spotlight on U.S. policy; how next-gen AI can help warehousing

Procore Technologies acquires DroneDeploy for $845M

Researchers develop modular nanorobot

Cookie Preferences