Paper page - ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
…Reinforcement learning (RL) provides a principled framework for optimizing such sequential decision tasks, as path rewards can naturally capture both short-term acceptance and long-term guidance effectiveness. However, naively applying policy…