Lessons at the Intersection of Security and AI
… For prompt injection, where people try to trick your model out of alignment once it’s been deployed, the industry has been trying to mitigate these attacks through adversarial training . …
… For prompt injection, where people try to trick your model out of alignment once it’s been deployed, the industry has been trying to mitigate these attacks through adversarial training . …
… And they are also vulnerable to an emerging type of security threat known as prompt injections, in which an attacker uses a malicious input to elicit an unintended response or data breach from the model. “All of these challenges can make it tricky for organizations to gain traction with LLMs. …
… You could additionally on that input side, scan your prompt before giving it to the LLM for these kind of prompt injections that we talked about. So our guardrail would actually scan that prompt and output a score or a probability of a prompt injection. …