Robotics

FOUND-IT: Foundation-model-first Task-driven 3D Scene Graphs with Granularity on Demand

Robos News Newsroom

Editorial Desk

2026-06-10 · 2 min read

Published June 10, 2026 · Category: Robotics

Overview

arXiv:2605.25371v2 Announce Type: replace Abstract: We present the first approach to build hierarchical task-driven 3D scene graphs of arbitrary indoor or outdoor environments using an uncalibrated monocular camera in real-time. We leverage geometric foundation models to estimate geometric attributes of the scene graph (e.g., object bounding boxes), but we also observe that traversability information (the "places" layer of a scene graph) can be directly reconstructed by adding an extra head to existing geometric foundation models, like VGGT. Our approach is task-driven in the sense that we adjust the granularity of the objects and regions in the map depending on the task; for instance, during a manipulation task, our approach is able to resolve small knobs on a stove, while during a navigation task it can focus on large objects (e.g., the entire stove). However, in a major departure from related work, we consider the realistic case where the list of tasks is not predefined and fixed, but evolves as the robot operates. This naturally allows dealing with complex loco-manipulation tasks, where the robot can dynamically adjust its representation as the task unfolds. We dub the resulting approach FOUND-IT. FOUND-IT also includes an agentic approach to query information in the scene graph. In addition to achieving 79% higher accuracy on the ASHiTA SG3D task grounding benchmark, we demonstrate FOUND-IT runs in real-time on a ground robot using a Jetson Thor. Furthermore, to highlight the robustness of our method, we demonstrate constructing 3D scene graphs on casually captured realtor apartment tours from YouTube. Code will be made available upon publication.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2605.25371

Robos News Newsroom

Robos News reports on robotics research, components, manufacturers, field deployments, and industrial automation worldwide. Tip our newsroom: [email protected]

Email the newsroom →

Reporting standard: Product specifications, deployment counts, and performance claims are attributed to their source. Safety-critical decisions should be based on the applicable technical documentation and validation for the operating environment.

Cookie Preferences

Overview

Source

Related Articles

Related Stories

NEURA Robotics establishes NEURA Gym RWTH Aachen to train physical AI

A mini robot to simplify dental treatment

Drive As You Like: Multi-Head Diffusion with Reinforcement Learning for Personalized Driving

VoLN: Vision-Only Long-Horizon Navigation---Paradigm, Benchmark, and Method

Cookie Preferences