Robotics

RN-D: Discretized Categorical Actors for On-Policy Reinforcement Learning

Robos News Newsroom

Editorial Desk

2026-06-25 · 2 min read

Published June 25, 2026 · Category: Robotics

Overview

arXiv:2601.23075v2 Announce Type: replace-cross Abstract: On-policy Reinforcement Learning (RL) remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are noisy, and policy updates must be conservative. In this paper, we revisit actor policy representation as a first-class design choice for on-policy RL. We study discretized categorical actors, which represent each action dimension as a distribution over discrete bins and induce a policy objective analogous to classification cross-entropy loss. Building on architectural advances from supervised learning, we further pair discretized categorical actors with regularized networks, yielding RN-D. Across diverse continuous-control benchmarks, we show that simply replacing the standard Gaussian actor with our proposed actor substantially improves performance, achieving state-of-the-art results within on-policy RL. We release our code at https://github.com/alwaysbyx/RND-RL.

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2601.23075

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

RN-D: Discretized Categorical Actors for On-Policy Reinforcement Learning

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Robust.AI chooses Aptiv PULSE sensor for Gen 3 Carter mobile robot

Hirebotics offers no-code, explosion-proof cobot for painting

ARM Institute expands RoboticsCareer.org into physical AI

ForceBand: Learning Forceful Manipulation with sEMG

Cookie Preferences