LLMs believe false statements even after explicit warnings that they're false
…While the base models showed no tendency toward this kind of misaligned behavior prior to the new training, the fine-tuned models showed “comparable” misalignment rates regardless of whether those behaviors were…