Robotics

VOLT: Vision and Language Trajectory Segmentation for Faster-than-Demonstration Policies

Robos News Newsroom

Editorial Desk

2026-06-10 · 2 min read

Published June 10, 2026 · Category: Robotics

Overview

arXiv:2606.06323v2 Announce Type: replace Abstract: Humans often take longer to demonstrate a task than a robot would need to execute it. Rather than learning to replicate the demonstration at the same pace, many industrial and practical applications require robots to perform tasks as quickly as possible. In this paper, we investigate several hypotheses for learning policies that operate faster-than-demonstrations. Our experiments show that the most effective strategy is to downsample recorded demonstrations and train the robot's policy on this accelerated data. However, uniformly downsampling an entire trajectory can be problematic. Some parts of a task can be safely sped up (e.g., unconstrained motion), while others demand slower, more precise motion (e.g., object interactions or fine manipulation). To address this challenge, we introduce VOLT, a vision-and-language trajectory segmentation method that reasons over video demonstrations, and leverages contextual cues to determine when acceleration is appropriate and when careful precision is required. VOLT identifies segments where slow, deliberate motion is necessary, then selectively downsamples the remaining segments. The resulting reformatted trajectories can be used with standard imitation learning approaches, such as diffusion policies. Our results highlight that segmentation quality is critical -- baseline methods often misidentify when acceleration is possible, leading to overly cautious or unreliable policies. Compared to state-of-the-art alternatives, VOLT allows robots to execute tasks faster while maintaining strong performance.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.06323

Robos News Newsroom

Robos News reports on robotics research, components, manufacturers, field deployments, and industrial automation worldwide. Tip our newsroom: [email protected]

Email the newsroom →

Reporting standard: Product specifications, deployment counts, and performance claims are attributed to their source. Safety-critical decisions should be based on the applicable technical documentation and validation for the operating environment.

Cookie Preferences

Overview

Source

Related Articles

Related Stories

NEURA Robotics establishes NEURA Gym RWTH Aachen to train physical AI

A mini robot to simplify dental treatment

Drive As You Like: Multi-Head Diffusion with Reinforcement Learning for Personalized Driving

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

Cookie Preferences