Who reported this story?

This story was reported by arXiv cs.RO.

Robotics

CoRe: Combined Rewards with Vision-Language Model Feedback for Preference-Aligned Reinforcement Learning

Robos News Newsroom

Editorial Desk

2026-07-03 · 2 min read

Published July 3, 2026 · Category: Robotics

Overview

arXiv:2607.01721v1 Announce Type: new Abstract: Reward design remains a central challenge in reinforcement learning (RL). Hand-crafted rewards are often difficult to specify and may lead to suboptimal policies, while learned rewards from preferences can suffer from inefficiency and unstable training. Inspired by the dual nature of human learning explored in cognitive science, we decompose rewards into two complementary components: Formal Rewards (FR), explicitly designed based on task knowledge, and Residual Rewards (RR), learned from observations to capture implicit and nuanced preferences. Based on this decomposition, we propose CoRe, a hybrid framework that integrates FR and RR with vision-language models (VLMs) feedback to achieve preference-aligned policies without human involvement. Our contributions are twofold: (1) We propose a Formal Reward Module (FRM) that leverages VLMs to iteratively design and optimize FR based on task knowledge and preference feedback, enabling the continual improvement of policy during training; (2) We introduce a Residual Reward Module (RRM) that learns RR from video-level preference by employing VLMs to generate preference labels and capturing nuanced rewards that complement FR, ensuring alignment with human intent. Through the synergy of FRM and RRM, CoRe enables the automatic construction of reliable rewards that are efficient and preference-aligned. Extensive experiments demonstrate that CoRe outperforms existing approaches in terms of policy learning effectiveness and efficiency on ten robotic manipulation tasks in simulation and five real-worlds. Videos can be found on our project website: https://core-2026.github.io/

Source

Originally published at arxiv.org.

Source: https://arxiv.org/abs/2607.01721

Robos News Newsroom

Robos News covers markets, crypto and commodities for Asia & the Middle East — tier-1 desk research, AI-driven analysis, institutional-grade data. Tip our newsroom: [email protected]

Email the newsroom →

Disclaimer: This article is for informational purposes only and does not constitute investment advice. Data may be delayed up to 15 minutes. Past performance is not indicative of future results. Consult a licensed financial advisor before making investment decisions.

CoRe: Combined Rewards with Vision-Language Model Feedback for Preference-Aligned Reinforcement Learning

Overview

Source

Related Articles

Related Stories

Overview

Source

Related Articles

Related Stories

Choreographing the Way of Water: A Computational Framework for Aquatic Robotic Art

Learning to Localize Reference Trajectories in Image-Space for Visual Navigation

BIEVR-LIO: Robust LiDAR-Inertial Odometry through Bump-Image-Enhanced Voxel Maps

Simulation Based Reward Function Validation for Multi-Agent On Orbit Inspection

Cookie Preferences