Subscribe

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

Google Strengthens Gemini AI With New Security Process

Google Strengthens Gemini AI With New Security Process Google Strengthens Gemini AI With New Security Process
IMAGE CREDITS: BGR

As agentic AI systems grow more capable, so too do the threats that target them. One of the most insidious dangers today is the adaptive indirect prompt injection (IPI) attack—an exploit that manipulates AI behavior through embedded instructions in seemingly benign content. Now, Google DeepMind (GDM) is taking a proactive, continuous approach to defend its flagship Gemini model from this evolving threat.

In a newly released white paper, Lessons from Defending Gemini Against Indirect Prompt Injections, DeepMind reveals how it has hardened Gemini 2.5 against adaptive IPI attacks, using automated red teaming, adversarial training, and layered defenses to reduce the success rate of these attacks significantly.

What Is an Indirect Prompt Injection Attack?

Unlike traditional attacks that require access to a model’s training data or architecture, IPI attacks work by exploiting the model’s ability to learn from the tools and environments it interacts with. In agentic AI systems—which are designed to act autonomously and learn on the fly—this creates a critical vulnerability.

Imagine an AI assistant managing your inbox. An attacker could embed malicious instructions in an email. If the model processes that content as input, it might learn and obey these hidden commands—such as sharing personal data, scheduling events, or sending out responses to specific keywords.

This method doesn’t break into the model—it tricks the AI into cooperating with the attacker.

Worse still are adaptive IPI attacks. These aren’t static, one-time tricks. They’re dynamic and iterative. If a particular embedded prompt fails, the attacker learns from the failure and adjusts. Over time, this process—called Tree of Attacks with Pruning (TAP)—zeroes in on effective attack vectors. TAP doesn’t require inside knowledge of the model—it just tests, refines, and repeats until it works.

In Gemini 2.0, TAP attacks succeeded in 99.8% of test cases. That number dropped to 53.6% in Gemini 2.5, thanks to GDM’s multi-layered defense strategy.

GDM’s Iterative Defense Model

Recognizing that a static defense won’t hold against adaptive attackers, Google DeepMind created a continuous learning loop for security:

  1. Automated Red Teaming (ART): GDM’s internal tools simulate real-world IPI attacks using adaptive techniques.
  2. Adversarial Training: When the ART suite succeeds in tricking Gemini, the model is retrained to recognize and ignore similar threats.
  3. Fine-Tuned Reinforcement: The model is updated with scenarios where malicious prompts are injected, ensuring it follows only user-authenticated requests.

This cycle of attack, test, retrain, repeat makes Gemini more resilient—yet GDM emphasizes that no defense is foolproof.

Interestingly, DeepMind found that combining defenses worked best. One successful method from Gemini 2.0 involved embedding a “Warning” in the model’s behavior: a system-level instruction telling the AI not to share private data when processing untrusted inputs.

When paired with adversarial training, this hybrid strategy yielded stronger resistance to adaptive prompt injection than either method alone. GDM’s findings suggest that the future of AI security lies in layered, complementary safeguards, not single-point solutions.

“We believe combining adversarial training and system-level defenses will substantially raise the cost and complexity for attackers,” GDM’s research team wrote. “It shifts the threat landscape toward more detectable or resource-intensive attacks, strengthening the overall security posture of agentic AI.”

Real-World Applications and Risk Mitigation

The practical impact is significant. In a simulated email-based prompt injection scenario, Gemini 2.5 was able to resist over 46% more attacks than its predecessor. That kind of progress matters—especially as AI systems are deployed in personal assistants, productivity tools, and sensitive communication workflows.

GDM has no illusions about complete immunity. “There is no silver bullet,” the white paper notes. “The goal is to raise the cost of attack and reduce the probability of success, just like in traditional cybersecurity.”

GDM will present its latest findings at the AI Risk Summit, taking place August 19–20 at the Ritz-Carlton, Half Moon Bay. There, researchers will share deeper insights into how adaptive threats are shaping the future of AI security—and how tools like Gemini 2.5 are leading the charge in defense.

Share with others