Teaching Claude why
… Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards. This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. …