🤖 Humanoid 🦾 Industrial & Cobot 🚚 AGV / AMR 🐕 Quadruped ⚙️ Reducers · Servos · Sensors 🚁 Drones & Autonomy 🧠 Embodied AI
Robos News
Robotics

Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation

arXiv:2512.23864v4 Announce Type: replace Abstract: Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address

Published June 30, 2026 · Category: Robotics

Overview

arXiv:2512.23864v4 Announce Type: replace Abstract: Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address this limitation, we introduce DreamTacVLA, a framework that grounds VLA models in contact physics by learning to feel the future. Our model adopts a hierarchical perception scheme in which high-resolution tactile images serve as micro-vision inputs coupled with wrist-camera local vision and third-person macro vision. To reconcile these multi-scale sensory streams, we first train a unified policy with a Hierarchical Spatial Alignment (HSA) loss that aligns tactile tokens with their spatial counterparts in the wrist and third-person views. To further deepen the model's understanding of fine-grained contact dynamics, we finetune the system with a tactile world model that predicts future tactile signals. To mitigate tactile data scarcity and the wear-prone nature of tactile sensors, we construct a hybrid large-scale dataset sourced from both high-fidelity digital twin and real-world experiments. By anticipating upcoming tactile states, DreamTacVLA acquires a rich model of contact physics and conditions its actions on both real observations and imagined consequences. Across contact-rich manipulation tasks, it outperforms state-of-the-art VLA baselines, achieving up to 95% success, highlighting the importance of understanding physical contact for robust, touch-aware robotic agents.

Source

Originally published at arxiv.org.

Related Articles

CD
Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →
Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Related Stories

More from News →