Agentic Security: Copilot Exfiltration and AI Vulnerability Hunting

The rapid integration of autonomous AI agents into enterprise environments is exposing critical new security boundaries. A striking demonstration of this is a vulnerability in Microsoft Copilot Cowork, where attackers can use indirect prompt injection via poisoned "skills" (custom text-based plugins) to exfiltrate sensitive files. By exploiting Cowork's design—which automatically approves sending Teams and Outlook messages to the active user without human confirmation—an injection can force the agent to retrieve pre-authenticated SharePoint download links and embed them in malicious HTML image tags. When the user views their Teams messages, their client automatically renders the image, exfiltrating the private download links to an attacker-controlled server. This incident highlights a fundamental flaw in current LLM-agent architectures: they do not separate data from code, and granting them broad, delegable tool-calling permissions is an inherently dangerous paradigm.¹²³

Conversely, agentic workflows are also demonstrating immense power on the defensive side. Anthropic's Claude (working with the Mythos preview research team) successfully discovered a critical integer overflow vulnerability in the Apple macOS kernel (CVE-2026-28952), which was subsequently patched in macOS Tahoe 26.5. This milestone has sparked a debate on the future of software maintenance. Some engineers predict a 24/7 security arms race where dedicated agents continuously fuzz and audit codebases, forcing software publishers to either adopt a rapid "bleeding edge" update model or double down on robust Long Term Support (LTS) releases to manage the constant influx of patches.

An instance of Enterprise security policies fail when visual exploits blind parsers to hijack agent authorizations. — It details how automated tool permissions inside agent pipelines can be hijacked via external text inputs to exfiltrate secured company files. ↩︎
An instance of Your enterprise security boundary is undefinable when autonomous agents execute inside the network. — Attackers can exploit indirect prompt injections to manipulate autonomous agent authorizations and exfiltrate credentials because the security boundary cannot be statically defined. ↩︎
An instance of Agent autonomy is no defense against strict liability for AI hallucinations. — Corporate deployment of autonomous capabilities creates material liability when prompt injections force agents to execute unconfirmed, destructive behaviors. ↩︎

Agentic Security: Copilot Exfiltration and AI Vulnerability Hunting

Sources

Agentic Security: Copilot Exfiltration and AI Vulnerability Hunting

Revision history