Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning

Robos News Newsroom

Editorial Desk

2026-06-30 · 2 min read

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2510.03142v2 Announce Type: replace Abstract: Visual navigation policy is widely regarded as a promising direction, as it mimics humans by using egocentric visual observations for navigation. However, optical information of visual observations is difficult to be explicitly modeled like LiDAR point clouds or depth maps, which subsequently requires intelligent models and large-scale data. To this end, we propose to leverage the intelligence of the Vision-Language-Action (VLA) model to learn diverse navigation capabilities from synthetic expert data in a teacher-student manner. Specifically, we implement the VLA model, MM-Nav, as a multi-view VLA (with 360 observations) based on pretrained large language models and visual foundation models. For large-scale navigation data, we collect expert data from three reinforcement learning (RL) experts trained with privileged depth information in three challenging tailor-made environments for different navigation capabilities: reaching, squeezing, and avoiding. We iteratively train our VLA model using data collected online from RL experts, where the training ratio is dynamically balanced based on performance on individual capabilities. Through extensive experiments in synthetic environments, we demonstrate that our model achieves strong generalization capability. Moreover, we find that our student VLA model outperforms the RL teachers, demonstrating the synergistic effect of integrating multiple capabilities. Extensive real-world experiments further confirm the effectiveness of our method.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2510.03142

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Sonair ADAR One 3D ultrasonic sensor is now safety-certified

ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control

Multi-UAV Formation Cooperative Obstacle Avoidance and Adaptive Shape Deformation Control in Complex Environments Based on BI-APF-RRT and Affine Transformation

HUMEMBR: Learning Human Routines for Predictive Embodied Navigation

Cookie Preferences