Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

Robos News Newsroom

Editorial Desk

2026-06-30 · 2 min read

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2606.29908v1 Announce Type: new Abstract: Existing world model-based planners for visual navigation typically follow a verification-centric paradigm, decoupling goal intent from trajectory synthesis. This approach suffers from candidate dependence, heavy computational overhead, and inconsistencies between sampled actions and predicted visuals. To address these issues, we propose SWAM (Spatial-perceiving World Action Model), a task-centric joint observation-action generation framework. Given start and goal RGB observations, SWAM performs single-pass inference to simultaneously generate intermediate RGB-D sequences and corresponding action trajectories, promoting goal-consistent trajectory generation and improved spatial feasibility. While SWAM leverages depth pseudo-labels during training to internalize spatial priors, it requires only monocular RGB input at inference time. We further introduce a visual-guided action refinement module and a trajectory-scale regularization loss to enforce fine-grained alignment between motion and visual cues while stabilizing predictions across varying distances. Extensive experiments show that SWAM significantly outperforms state-of-the-art two-stage planners in success rate, trajectory accuracy, and inference efficiency, while demonstrating robust zero-shot generalization to unseen environments.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2606.29908

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Pondering the Way: Spatial-perceiving World Action Model for Embodied Navigation

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Sonair ADAR One 3D ultrasonic sensor is now safety-certified

ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control

Multi-UAV Formation Cooperative Obstacle Avoidance and Adaptive Shape Deformation Control in Complex Environments Based on BI-APF-RRT and Affine Transformation

HUMEMBR: Learning Human Routines for Predictive Embodied Navigation

Cookie Preferences