Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

WARP-RM: A Warp-Augmented Relative Progress Reward Model for Data Curation

Robos News Newsroom

Editorial Desk

2026-06-29 · 2 min read

Published June 29, 2026 · Category: Robotics

Overview

arXiv:2606.28320v1 Announce Type: new Abstract: Scaling imitation learning requires large datasets, yet human teleoperation inevitably produces mixed-quality demonstrations containing hesitations and recoveries. Prior frame-level progress reward models supervise on absolute temporal progress proxies that suffer from label noise, or require costly human annotations to define subtask boundaries. We present WARP (Warp-Augmented Relative Progress), a novel fully self-supervised algorithm for learning dense, signed relative progress magnitudes directly from successful demonstrations. WARP generates per-frame progress targets via time-warp augmentations of demonstrations (variable playback speeds and reversals) and we train WARP-RM to predict the normalized elapsed time between input frames. Aggregating these predictions across overlapping windows yields a dense frame-level progress signal. We then introduce WARP-BC, which leverages these scalar reward estimates to upweight high-advantage action chunks during behavior cloning, where chunk-level advantage is obtained by aggregating per-frame rewards. We evaluate our approach on a physical bimanual robot system performing a long-horizon deformable object manipulation task: folding T-shirts from a random crumpled start. To evaluate policy robustness against suboptimal data, we construct training datasets of varying quality using episode length as a proxy for teleoperation sub-optimality. As the dataset is widened to admit more inefficiencies, WARP-BC maintains a 19/20 success rate compared to vanilla BC's collapse to 2/20, improving throughput by up to 18x.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.28320

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

WARP-RM: A Warp-Augmented Relative Progress Reward Model for Data Curation

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Learning to Throw: Agile and Accurate Cable-Suspended Payload Delivery with a Quadrotor

PPO-EAL: Exact Augmented Lagrangian Proximal Policy Optimization for Safe Robotic Control

On dynamic multi-agent pathfinding methods: review, simulations and modifications

AI-Driven Synthesis for High-Tech System Design: Automating Innovation

Cookie Preferences