🤖 人形机器人 Humanoid 🦾 工业 & 协作 Industrial / Cobot 🚚 AGV / AMR 🐕 四足 Quadruped ⚙️ 减速器 · 伺服 · 传感器 📈 A股 · 港股 · 美股机器人板块 🧠 具身智能 Embodied AI 实时行情 M1 上线 →
Robos News
Robotics

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

arXiv:2606.10371v1 Announce Type: new Abstract: Diffusion-based action generation has become a foundational component of embodied AI, but its reliance on visual conditioning leaves deployed visuomotor policies vulnerable to adversarial manipulation. Most prior attacks focus on disruption: they perturb the observation stream to reduce task success or induce erratic behavior. We study a stronger threat, Test-time Adversarial Takeover (TAKO), in which an attacker obtains a real-time steering inter

Test-time Adversarial Takeover: A Real-time Hijacking Interface against Robotic Diffusion Policies

Published June 10, 2026 · Category: Robotics

Overview

arXiv:2606.10371v1 Announce Type: new Abstract: Diffusion-based action generation has become a foundational component of embodied AI, but its reliance on visual conditioning leaves deployed visuomotor policies vulnerable to adversarial manipulation. Most prior attacks focus on disruption: they perturb the observation stream to reduce task success or induce erratic behavior. We study a stronger threat, Test-time Adversarial Takeover (TAKO), in which an attacker obtains a real-time steering interface over a frozen robot policy and turns it into a remotely piloted instrument. TAKO learns a small vocabulary of reusable universal patches through differentiable diffusion inference; at test time, the attacker switches among these patches in the camera stream to compose attacker-chosen trajectories. This works because the perturbation acts on the visual conditioning pathway, where the induced bias can persist through iterative generative inference. We further show that the natural targeted baseline, target-policy matching, fails because the victim policy cannot reliably supervise itself on out-of-distribution target shifts. Across four tasks (2D manipulation, simulated aerial delivery, simulated ground navigation, and physical-world ground navigation), two visual encoders (ResNet-18 and EfficientNet-B0 + Transformer), and three generative inference families (DDPM, DDIM, and flow matching), human operators achieve 100\% takeover success on attacker-defined objectives in every evaluated setting. The project page is available at https://tako-attack.github.io.

Source

Originally published at arxiv.org.

Related Articles

CD
Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →
Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.