Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning

Robos News Newsroom

Editorial Desk

2026-06-30 · 2 min read

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2606.29834v1 Announce Type: new Abstract: Real-world robot learning increasingly relies on heterogeneous data, but demonstrations and rollouts often mix useful progress with stalls, corrections, and suboptimal behavior. Effective policy learning therefore requires frame-level advantages that distinguish reliable local progress from failures and regressions. We propose Self-supervised Temporal Ensemble Advantage Modeling (STEAM), a label-free method that learns such advantages from expert demonstrations. STEAM trains an ensemble of temporal-offset predictors on frame pairs within expert trajectories, using the normalized temporal offset between two frames as a self-supervised signal. Each predictor maps a frame pair to a distribution over temporal offsets, which is converted into a scalar advantage. STEAM then takes the minimum advantage across the ensemble to score mixed-quality rollout data conservatively. Across real-world bimanual towel folding, chip checkout, cola restocking, and single-arm pick-and-place tasks, STEAM identifies stalls, failures, and recoveries. When combined with CFGRL, STEAM further improves policy success rate by 59%, 54.3%, 23% and 16.2% over baselines, respectively.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.29834

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Sonair ADAR One 3D ultrasonic sensor is now safety-certified

ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control

Multi-UAV Formation Cooperative Obstacle Avoidance and Adaptive Shape Deformation Control in Complex Environments Based on BI-APF-RRT and Affine Transformation

HUMEMBR: Learning Human Routines for Predictive Embodied Navigation

Cookie Preferences