How we contain Claude across products
… This category includes both prompt injection and conventional attacks on the agent's runtime, orchestration layer, or proxy. …
… This category includes both prompt injection and conventional attacks on the agent's runtime, orchestration layer, or proxy. …
… Defending against attacks Prompt injections are malicious instructions hidden inside the content that an agent is asked to process. …
… Because the input is identical other than the final instruction, stage 2's prompt is almost entirely cache-hit from stage 1. Why the prompt-injection probe matters The transcript classifier's injection defense is structural as it never sees tool results. …
… You can find out more about how to mitigate prompt injections and other safety concerns in our API docs . …
… With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. …
… In the coupled design, any untrusted code that Claude generated was run in the same container as credentials—so a prompt injection only had to convince Claude to read its own environment. …
… On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others such as its tendency to give overly detailed harm-reduction advice on controlled substances , Opus 4.7 is modestly weaker. …