🤖 Humanoid 🦾 Industrial & Cobot 🚚 AGV / AMR 🐕 Quadruped ⚙️ Reducers · Servos · Sensors 🚁 Drones & Autonomy 🧠 Embodied AI
Robos News
Robotics

Micro-Swarm Locomotion Optimization in Dynamic Flow using Multi-Objective Multi-Agent Reinforcement Learning

arXiv:2605.25025v2 Announce Type: replace Abstract: Coordinating micro-robotic swarms in realistic, time-dependent fluid environments remains a major challenge for biomedical and environmental applications. We present a hybrid CFD-MO-MARL (Computational Fluid Dynamics-Multi Objective-Multi Agent Reinforcement Learning) framework that couples a high-fidelity incompressible Navier--Stokes solver with decentralized proximal policy optimization to learn swarm control policies in oscillatory flow. S

Micro-Swarm Locomotion Optimization in Dynamic Flow using Multi-Objective Multi-Agent Reinforcement Learning

Published June 15, 2026 · Category: Robotics

Overview

arXiv:2605.25025v2 Announce Type: replace Abstract: Coordinating micro-robotic swarms in realistic, time-dependent fluid environments remains a major challenge for biomedical and environmental applications. We present a hybrid CFD-MO-MARL (Computational Fluid Dynamics-Multi Objective-Multi Agent Reinforcement Learning) framework that couples a high-fidelity incompressible Navier--Stokes solver with decentralized proximal policy optimization to learn swarm control policies in oscillatory flow. Sixteen magnetically actuated micro-robots were simulated to navigate a pulsatile arterial waveform within a 2 mm channel while jointly optimizing upstream progression, energy efficiency, and motion smoothness. Conflicting objectives are resolved using Projected Conflicting Gradient (PCGrad) surgery. Without PCGrad, energy and smoothness rewards collapse during training, demonstrating that gradient conflict resolution is essential for stable multi-objective learning. The converged policy achieves progress rewards of 6.5-7.0, energy efficiency of 0.63-0.65, and smoothness of 0.97-0.99, outperforming brute-force baselines by more than 8 reward units on the primary objective. Training reveals three emergent behaviors not encoded in the reward function: hydrodynamic throttling formations that reduce peak flow velocities, a cycle-synchronized ratchet mechanism that exploits flow reversals for upstream movement, and individualized final-approach strategies near the target boundary. These results demonstrate that physically realistic fluid--agent interactions can be integrated directly into multi-objective reinforcement learning, providing a scalable framework for micro-swarm control in biomedical navigation, environmental monitoring, and microfluidic systems.

Source

Originally published at arxiv.org.

Related Articles

CD
Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →
Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Related Stories

More from News →