Sphere-VIO: Fast and Robust Visual-Inertial Odometry via Unified Spherical Representation for Heterogeneous Multi-Camera Systems
arXiv:2606.29910v1 Announce Type: new Abstract: Multi-camera visual-inertial odometry (VIO) overcomes the inherent limitations of pure visual systems by expanding the field of view. However, existing algorithms are typically tailored for fixed camera setups and lack unified compatibility with heterogeneous multi-camera systems. Meanwhile, due to the absence of a unified cross-camera representation and association mechanism, current methods struggle to achieve a balance among robust cross-camera
Overview
arXiv:2606.29910v1 Announce Type: new Abstract: Multi-camera visual-inertial odometry (VIO) overcomes the inherent limitations of pure visual systems by expanding the field of view. However, existing algorithms are typically tailored for fixed camera setups and lack unified compatibility with heterogeneous multi-camera systems. Meanwhile, due to the absence of a unified cross-camera representation and association mechanism, current methods struggle to achieve a balance among robust cross-camera feature tracking, stable depth estimation, and reliable real-time performance. To address these issues, we present Sphere-VIO, a lightweight filter-based VIO framework with unified spherical representation for heterogeneous multi-camera systems. Specifically, we first propose a Unified Spherical Panorama Model (USPM) that supports all standard camera models and enables bidirectional fast mapping between multi-camera images and a shared spherical space without sequential stitching, simplifying cross-camera feature management and improving triangulation efficiency. Second, we design a parallel-accelerated depth-guided semi-direct tracking pipeline, namely Hierarchical Omnidirectional Feature Alignment (HOFA), with global spherical constraints for robust cross-camera matching, and fuse multi-camera depth observations into a standard depth filter for stable initialization. Finally, we develop a multi-camera-adapted ESKF backend that employs spherical bearing residuals and Schur complement marginalization to minimize computational overhead, enabling accurate real-time state estimation on resource-constrained devices. Extensive experiments on public benchmarks and a custom omnidirectional dataset show that Sphere-VIO achieves superior trade-offs between accuracy, robustness, efficiency, and cross-camera generality.
Source
Originally published at arxiv.org.
Related Articles
Source: https://arxiv.org/abs/2606.29910
