Paper page - Recovering Hidden Reward in Diffusion-Based Policies
…Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how…