🤖 Humanoid 🦾 Industrial & Cobot 🚚 AGV / AMR 🐕 Quadruped ⚙️ Reducers · Servos · Sensors 🚁 Drones & Autonomy 🧠 Embodied AI
Robos News
Robotics

Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

arXiv:2407.15283v2 Announce Type: replace-cross Abstract: Industry is moving toward autonomous, network-connected machines that detect and adapt to changing conditions, including hardware faults. Conventional fault-tolerant design duplicates hardware and reroutes control logic; reinforcement learning (RL) offers a learning-based alternative. This paper presents the first systematic comparison of two RL algorithms -- Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) -- for integrati

Published July 2, 2026 · Category: Robotics

Overview

arXiv:2407.15283v2 Announce Type: replace-cross Abstract: Industry is moving toward autonomous, network-connected machines that detect and adapt to changing conditions, including hardware faults. Conventional fault-tolerant design duplicates hardware and reroutes control logic; reinforcement learning (RL) offers a learning-based alternative. This paper presents the first systematic comparison of two RL algorithms -- Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) -- for integrating fault tolerance into control. Beyond algorithm choice, we investigate four knowledge-transfer strategies: retaining or discarding model parameters, and retaining or discarding storage contents. Performance is evaluated in two Gymnasium environments: Ant-v5 and FetchReachDense-v3. Results show rapid, fault-specific recovery with clear trade-offs. In Ant-v5, retaining PPO's parameters boosts early returns and remains the safest choice across all faults, while retaining SAC's parameters yields mixed outcomes. SAC's early performance further depends on whether the replay buffer is retained: beneficial when prior experiences match current dynamics, but harmful when they diverge. In FetchReachDense-v3, discarding both PPO's and SAC's parameters was most effective under sensor corruption. Across tasks, both algorithms recover near-normal performance within minutes in low-dimensional settings and within days in high-dimensional settings, highlighting a clear trade-off between adaptation speed and asymptotic performance. These findings demonstrate that RL can deliver robust fault tolerance and offer practical guidelines.

Source

Originally published at arxiv.org.

Related Articles

CD
Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →
Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

Related Stories

More from News →