Most security leaders have a clear mental model of their attack surface: the systems, data, endpoints and identities an attacker can see, reach or abuse. However, generative AI changed the model itself, mixing untrusted content with executable instructions, introducing probabilistic decision-making into critical workflows and surfacing high-value assets to a class of attacks that traditional controls were never designed to stop.
Agentic AI compounds this further. Where a standard generative AI tool offers a response, an autonomous agent perceives inputs, plans and takes action, calling APIs, writing to systems and triggering workflows. The same structural vulnerabilities that apply to language models apply to agents, but the blast radius is larger because the model is no longer just answering, it is doing.
The first step lies in understanding where the exposure sits. This article maps the AI attack surface across four layers and explores where agentic systems raise the stakes.
The four layers of AI exposure
AI systems blur the boundary between instructions and untrusted content, meaning controls built on clean separation of data and commands no longer hold in the same way. Rather than assuming risk can be eliminated, security leaders need to reduce impact, constrain behaviour and monitor for exploitation.
Together, these layers map to the practical work of architecting and fortifying AI systems, applying zero trust to agents, and building monitor-and-recover operations.
Layer 1: Inputs
When an LLM or RAG pipeline processes user prompts, emails, documents, web pages, support tickets or knowledge base articles, all this content becomes executable influence, meaning the model may treat it as instructions, not just data.
This opens the door to indirect prompt injection: malicious instructions embedded in routine content that the model then follows. To achieve this, an attacker does not need access to the system but just needs their content to reach the model. A poisoned support ticket, a document with hidden instructions or a webpage loaded during research; each could be a potential vector.
In an agentic context, this risk escalates. When a model can call tools, send emails, query databases or execute code, a successfully injected instruction does not produce a misleading answer but instead triggers a real-world action. As the agent’s capability increases, so does the risk.
Layer 2: The model
Models themselves are assets with meaningful attack value, with risks including:
- Privacy leakage: Models trained on sensitive data can surface that data in outputs, even without a deliberate attack.
- Training data extraction: Research has shown that carefully constructed queries can extract hundreds of verbatim sequences from a large language model, including personally identifiable information. Leaders should plan controls that reduce both the likelihood and impact of this kind of leakage.
- Model extraction (IP theft): A form of IP theft where adversaries use high-volume querying to replicate a model’s behaviour, and in some cases, infer aspects of its design.
For organisations building or fine-tuning proprietary models, the model layer is a confidentiality and competitive risk, not just an operational one.
Layer 3: Data and knowledge stores
Training sets, embedding stores, vector databases and context caches introduce integrity risk. If an attacker can influence what goes into your knowledge base through a poisoned document, a compromised data source or a manipulated retrieval result, they can also degrade the reliability of every downstream decision the model makes.
Poisoning attacks target the training or fine-tuning process, while evasion attacks manipulate inputs at inference time, and privacy attacks exploit the model's memorisation. These risks arise during design, deployment and operation, not only during training. A model that is clean on release can be compromised through its data pipeline later.
Layer 4: Integrations, agents and economics
The fourth layer is where AI connects to the rest of your business: APIs, plugins, connectors, email systems, CRM platforms, cloud services and code repositories. Each integration is a potential path for data exfiltration or lateral movement following a successful prompt injection. For agentic systems, this layer can convert a model vulnerability into a business incident. The tools an agent can call define the scope of the damage they can cause if compromised. Treat agent permissions like any other privileged identity, and limit tools, data scopes and actions to the minimum needed.
Meanwhile, agent loops that run without human oversight, or applications that fail to enforce query limits, can generate unbounded API consumption, creating financial shock and operational disruption as a denial-of-service variant.
This is where zero trust principles need extending to agents, with explicit verification, least privilege and continuous policy enforcement for every tool call. It is also where operating model discipline matters most, with monitoring and recovery processes that can spot abnormal behaviour quickly and contain it before it becomes an incident.
What this means for security leaders
The organisations that will manage AI risk map their exposure first, design their controls around realistic threat scenarios, and build operations that can detect and respond when things go wrong. To move from exposure mapping to risk reduction, most organisations start by hardening the AI landing zone: identity and access management for people and agents, centralised logging and monitoring, and guardrails for usage and cost. Other early priorities include surfacing shadow AI that sits outside approved tooling and governance.
These four layers provide a starting point for that conversation. To learn more about implementation, architecture and the operating model changes, download our e-book and find out if your enterprise security profile is AI-ready.
