How we estimate the risk from prompt injection attacks on AI systems (Google Online Security Blog)
Posted by the Agentic AI Security Team Modern AI systems, like Gemini, are more capable than ever, helping retrieve data and perform actions on behalf of users. However, data from external sources present new security challenges if untrusted sources are available to execute instructions on AI systems. Attackers can take advantage of this by hiding malicious instructions in data that are likely to be retrieved by the AI system, to manipulate its behavior. This type of attack is commonly referred to as an "indirect prompt injection," a term first coined by Kai Greshake and the NVIDIA team. To mitigate the risk posed by this class of attacks, we are actively deploying defenses within our AI systems along with measurement and monitoring tools. One of these tools is a robust evaluation…