Paper page - Recovering Hidden Reward in Diffusion-Based Policies
…curious how robust the reward recovery is when the max-entropy assumption is only approximately satisfied in real expert data. the core move—using the energy gradient as the denoising field and…