Paper page - Useful Memories Become Faulty When Continuously Updated by LLMs
…More surprisingly, even when consolidating from ground-truth solutions, GPT-5.4 fails on 54% of a set of ARC-AGI problems it had previously solved without memory. We trace the regression…