The more you look at AI agents, the more it seems can go wrong with them, so Microsoft has extended its taxonomy of failure modes to help with discussion of possible fixes.
Microsoft has identified seven new failure modes in agentic AI systems, in addition to those it identified last year in its first Taxonomy of Failure Modes in Agentic AI Systems.
Four things contributed to the growing list of ways agentic AI can go wrong: the speed at which the technology went mainstream, the growing maturity of the Model Context Protocol (MCP) ecosystem, the rise of computer-use agents, and finally the gathering of more empirical evidence as researchers obtained more real-life findings.
The seven new failure modes it has identified are:
- Agentic Supply Chain Compromise —agent behavior can be affected by natural language rather than malicious code;
- Goal Hijacking — adversarial instructions appear aligned with legitimate task completion, while silently redirecting the agent’s terminal goal;
- Inter-Agent Trust Escalation —a compromised agent asserts false identity or inflates claimed permissions to an orchestrator;
- Computer Use Agent (CUA) Visual Attack — agents operating through graphical interfaces can be manipulated through content that carries adversarial instructions for the agent;
- Session Context Contamination —an adversary introduces data that biases the agent’s reasoning in subsequent steps, without triggering safety controls at any individual step;
- MCP / Plugin Abuse — an update on the original taxonomy’s coverage of function compromise around MCP and plugin protocols, specifically attack surfaces specific to those protocols;
- Capability / Architecture Disclosure —an agent reveals internal implementation details such as tool names and schemas, system-prompt structure, memory interfaces, or consent/human-in-the-loop trigger logic.
Microsoft advises security teams using these definitions to influence their planning to inventory their your supply chain, generating a software bill of materials (SBOM) for every deployed agent, to verify agent identity cryptographically, not positionally, by issuing attestable credentials at provisioning, to add the seven new failure modes to their red-team coverage matrix, and to audit the human-in-the-loop user experience as a security control.
This article first appeared on InfoWorld.
SUBSCRIBE TO OUR NEWSLETTER
From our editors straight to your inbox
Get started by entering your email address below.










