Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

Transport Discrepancy as a Reliability Signal for Vision-Language-Action Models

Robos News Newsroom

Editorial Desk

2026-07-03 · 2 min read

Published July 3, 2026 · Category: Robotics

Overview

arXiv:2512.01715v2 Announce Type: replace Abstract: Vision-language-action (VLA) models that generate continuous action chunks via flow matching lack an internal signal for judging whether a given prediction is reliable. Distribution shift and long-horizon rollouts can push backbone representations away from the region the action head decodes reliably, yet the policy has no mechanism to detect or react to this drift. We observe that the cost of transporting observation features to the action representation in a shared feature space rises precisely when such drift occurs, providing a per-step reliability estimate without extra supervision. Building on this observation, we propose DiG (Discrepancy Gate), a lightweight plug-in module for flow-matching VLA policies. DiG computes a sliced Wasserstein transport cost between backbone features and the action expert's own input projection, maps it through an exponential gate, and uses the gate to modulate both a residual feature refinement and the training loss. At inference time, the gate enables DiG-Refinefine, an iterative refinement process that corrects action chunks before execution. Experiments on both simulation and real-world scenarios show that DiG consistently improves success rates, with the largest gains under distribution shift and on long-horizon tasks.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2512.01715

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Transport Discrepancy as a Reliability Signal for Vision-Language-Action Models

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Choreographing the Way of Water: A Computational Framework for Aquatic Robotic Art

Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

BIEVR-LIO: Robust LiDAR-Inertial Odometry through Bump-Image-Enhanced Voxel Maps

Simulation Based Reward Function Validation for Multi-Agent On Orbit Inspection

Cookie Preferences