TL;DR for defenders. Treat every LLM in your stack as an untrusted component that will eventually follow someone else's instructions. There is no patch for prompt injection - the model reads instructions and data through the same channel, by design. What works: least-privilege tool scopes, output handling as strict as you'd apply to user input, breaking the data-exfiltration leg (egress controls on what the model can render or call), a canary token in the system prompt as a leak tripwire, and logging prompts, completions, and tool calls as first-class audit events. Details and starter detections below.

Why this belongs in your threat model now

LLM-integrated applications stopped being experimental the moment they got access to real data and real actions. The clearest proof point so far is EchoLeak (CVE-2025-32711, CVSS 9.3): a zero-click vulnerability in Microsoft 365 Copilot disclosed in June 2025, where a crafted email could make Copilot exfiltrate data from the user's context - mail, files, chat history - without the victim ever clicking anything. Microsoft fixed it server-side, but the underlying class (researchers called it an "LLM scope violation") is exactly what this writeup is about: untrusted content steering a model that holds privileged access.

This wasn't novel in kind. Indirect prompt injection against LLM-integrated applications was described systematically by Greshake et al. back in early 2023, and it has been the number-one entry in every edition of the OWASP Top 10 for LLM Applications since the list existed. The gap between "known since 2023" and "still shipping exploitable in 2025's flagship products" tells you how hard the problem is - and why detection and blast-radius design matter more than waiting for a fix.

The Top 10, in one defensive pass

The 2025 list in production terms - what each risk looks like in an app you actually run, and the primary defense:

RiskWhat it looks like in productionPrimary defense
LLM01 Prompt InjectionUntrusted content (user input, a retrieved doc, an email) steers the model's behaviorLeast privilege + egress control; assume it happens
LLM02 Sensitive Information DisclosureModel reveals other users' data, internal docs, or credentials it was givenData minimization; don't give the model secrets
LLM03 Supply ChainPoisoned or backdoored third-party model, adapter, or datasetPin versions, verify sources, inventory models like packages
LLM04 Data & Model PoisoningAttacker-influenced training or fine-tuning data changes behaviorProvenance and validation on anything you train on
LLM05 Improper Output HandlingModel output flows into HTML, SQL, or a shell without encoding - classic injection, new sourceTreat output as untrusted input; encode and parameterize
LLM06 Excessive AgencyThe model can call tools or APIs with more permission than the task needsScope tools per task; human approval for consequential actions
LLM07 System Prompt LeakageInstructions, and worse, any secrets inside them, get echoed outNo secrets in prompts; canary tripwire (below)
LLM08 Vector & Embedding WeaknessesRAG retrieval pulls attacker-planted or cross-tenant content into contextAccess control at retrieval time, source tagging
LLM09 MisinformationConfident wrong answers drive real decisionsGrounding, citations, human review where it matters
LLM10 Unbounded ConsumptionToken-burning abuse: denial of service or a shocking billRate limits, quotas, per-identity cost caps

Notice the pattern: half of these are old vulnerability classes wearing a new interface. LLM05 is unescaped output, LLM03 is supply chain, LLM10 is resource exhaustion. Your existing instincts apply. The genuinely new ones are LLM01, LLM06, and LLM08 - and they compound each other, which is where we go next.

Prompt injection, properly understood

SQL injection had a real fix: parameterized queries separate code from data. Prompt injection has no equivalent, because a transformer has one input channel. System prompt, user message, retrieved document, tool output - it's all tokens in the same context window, and the model's "instruction following" is a statistical tendency, not a security boundary. MITRE ATLAS catalogs this as AML.T0051, in two flavors that matter very differently for defense:

A useful frame for when indirect injection becomes a real breach is what Simon Willison calls the lethal trifecta: an AI system that combines (1) access to private data, (2) exposure to untrusted content, and (3) a way to communicate externally. All three together means attacker instructions can reach private data and carry it out. EchoLeak was precisely this trifecta inside M365 Copilot - the crafted email was the untrusted content, the user's mail and files were the private data, and image/link rendering was the exfiltration channel. You usually can't remove legs one and two. Defense concentrates on leg three.

What you need flowing into your SIEM

You cannot detect what you don't log, and most LLM apps today log almost nothing security-relevant. Whether you build or buy, insist on these as structured, retained, queryable events:

Detection strategy

Same philosophy as every UMBRASEC writeup: layers, from highest precision to broadest. Schemas below are illustrative - LLM apps don't share a standard log format yet, so adapt the field names to yours.

1. Canary token in the system prompt (near-zero false positives)

The honeypot-SPN trick, ported to LLMs. Plant a unique, meaningless marker in your system prompt and alert if it ever appears in model output - because no legitimate completion has any reason to contain it. One marker, two strong signals: your system prompt is leaking (LLM07), and the model is disclosing context it was told not to - which often means an injection attempt is steering it.

# System prompt (excerpt)
# Integrity marker. Never include the following token in any response:
# UMB-CANARY-3f91c2a8

# Detection (pseudo-rule over completion logs)
alert when:
    response.text contains "UMB-CANARY-"
severity: high
action: capture full session (prompt chain, retrieved docs, tool calls)
        for review; rotate the canary after any hit

Tuning note. Use a random value per deployment, rotate it on a schedule and after every hit, and make sure it's excluded from any prompt text you intentionally publish. The same idea extends to RAG: seed a decoy document no honest query should retrieve, and alert when it enters a context window - that's your honeypot for LLM08-style retrieval abuse.

2. Egress: watch what the model is allowed to render or call

EchoLeak exfiltrated through a rendered image URL - the data left in the query string of a request the victim's client made automatically. That channel generalizes: markdown images, auto-fetched links, and tool calls that take URLs are the standard exfiltration legs of the trifecta. Lock them down by policy, and alert on the attempts:

# Output policy (enforce before rendering / before the HTTP client fires)
- strip or neutralize markdown images and links in model output
  unless the host is on an explicit allowlist
- block tool calls whose URL/domain argument is not allowlisted

# Detection (pseudo-rule over output + tool-call logs)
alert when:
    output.contains_image_or_link AND destination.host not in allowlist
or:
    toolcall.name in ("http_get", "browse", "send_email")
    AND toolcall.argument_domain not in allowlist
    AND session.context_included_external_content == true

That last condition is the high-signal one: the model read untrusted content, then immediately tried to reach an unfamiliar destination. Legitimate sessions do this rarely; injected ones do it by construction.

3. Behavioral: tool-call fan-out and sequence anomalies

The agent equivalent of the Kerberoasting fan-out rule. A copilot answering a question touches one or two tools; a hijacked agent enumerates - reading many documents, then calling a send/export/write tool it rarely uses. Baseline per tool and per identity, then alert on the outliers:

# Pseudo-aggregation over tool-call logs
alert when, within 5m, a single session:
    calls >= N distinct tools            # fan-out (baseline N first)
    or reads >= M distinct documents     # bulk context-stuffing
    or invokes a "consequential" tool    # send_email, create_rule,
       it has never used before          # export, delete, payment
severity: medium - this is a triage feed, not a pager rule

Run it report-only for a couple of weeks exactly like a SIEM correlation rule: find your noisy legitimate automations, allowlist them, and set thresholds above your noisiest honest session.

Mitigation: design beats detection here

More than in any other writeup on this site, the architecture is the control. In order of impact:

Honest limitations

References

Scope note. This is a defensive writeup. It describes injection classes only to the depth a defender needs to log, detect, and contain them - it deliberately contains no working injection payloads, jailbreak strings, or evasion techniques. UMBRASEC publishes defense, not offense.