Paper page - Turning Drift into Constraint: Robust Reasoning Alignment in Non-Stationary Environments
…Xiaoyu Yang , , , Abstract A novel framework called Autonomous Preference Optimization (APO) is proposed to address reasoning alignment challenges in multi-modal large language models under concept drift conditions, achieving improved robustness and…